import numpy as np
import pandas as pd
from datetime import datetime
import plotly.express as px
from IPython.display import display, display_html , HTML
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score, accuracy_score, confusion_matrix, classification_report, roc_curve
from sklearn.model_selection import learning_curve, cross_val_score, GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.svm import SVC
from sklearn.pipeline import make_pipeline
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import RobustScaler,StandardScaler,MinMaxScaler
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
import warnings
warnings.filterwarnings('ignore')
As seen in the project proposal. Finding a dataset with accurate and satisfied data is a challenge in itself. Thus the choice of the dataset by itself can be considered a bias. As a student it is near impossible to certify rules for data completion, or even create a dataset by yourself from tracking actual data on the internet. This task is not feasible in the short term.
The United Kingdom has a more open and established data source. Having implemented a tracking system for traffic incidents starting from 2006. A single year of data consists of three seperate sets.
This data can give an accurate description of the situation and involved parties with resulting damages. Most importantly covering the target variable casualty_severity
For a start each of the datasets will be imported
characteristics = pd.read_csv('/Users/Matt/Desktop/AI/CHALLENGE1/Brit/dft-road-casualty-statistics-accident-2020.csv')
characteristics.name = 'characteristics'
casualty = pd.read_csv('/Users/Matt/Desktop/AI/CHALLENGE1/Brit/dft-road-casualty-statistics-casualty-2020.csv')
casualty.name = 'casualty'
vehicles = pd.read_csv('/Users/Matt/Desktop/AI/CHALLENGE1/Brit/dft-road-casualty-statistics-vehicle-2020.csv')
vehicles.name = 'vehicles'
ref = pd.read_excel('/Users/Matt/Desktop/AI/CHALLENGE1/Brit/Road-Safety-Open-Dataset-Data-Guide.xlsx')
datasets = [characteristics,casualty,vehicles]
characteristics = characteristics.set_index('accident_index')
casualty = casualty.set_index('accident_index')
vehicles = vehicles.set_index('accident_index')
pd.set_option('display.max_row',max(characteristics.shape[0],casualty.shape[0],vehicles.shape[0]))
pd.set_option('display.max_column',max(characteristics.shape[1],casualty.shape[1],vehicles.shape[1]))
for df in datasets:
print ("The dataset",df.name,"has",df.shape[0],"rows and",df.shape[1],"columns")
The dataset characteristics has 91199 rows and 36 columns The dataset casualty has 115584 rows and 18 columns The dataset vehicles has 167375 rows and 27 columns
Here we can observe an interesting trait. Namely, the count of each dataset. An order of grouping can already be established.
Lets take a look at each dataset and its features so we can properly confirm this theory.
Each of these datasets already comes partly categorized. As certain features display a number. This can be directly linked to the Reference Table under each dataset.
As specified previously this dataset concerns the descriptive features of each accident. Mostly where and when it took place. Some other features like road types or junction details can also be observed. Also weather details can be observed. These might contain an interesting correlation to the casualty severity in each case.
characteristics.head()
| accident_year | accident_reference | location_easting_osgr | location_northing_osgr | longitude | latitude | police_force | accident_severity | number_of_vehicles | number_of_casualties | date | day_of_week | time | local_authority_district | local_authority_ons_district | local_authority_highway | first_road_class | first_road_number | road_type | speed_limit | junction_detail | junction_control | second_road_class | second_road_number | pedestrian_crossing_human_control | pedestrian_crossing_physical_facilities | light_conditions | weather_conditions | road_surface_conditions | special_conditions_at_site | carriageway_hazards | urban_or_rural_area | did_police_officer_attend_scene_of_accident | trunk_road_flag | lsoa_of_accident_location | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| accident_index | |||||||||||||||||||||||||||||||||||
| 2020010219808 | 2020 | 10219808 | 521389.0 | 175144.0 | -0.254001 | 51.462262 | 1 | 3 | 1 | 1 | 04/02/2020 | 3 | 09:00 | 10 | E09000032 | E09000032 | 6 | 0 | 6 | 20 | 0 | -1 | 6 | 0 | 9 | 9 | 1 | 9 | 9 | 0 | 0 | 1 | 3 | 2 | E01004576 |
| 2020010220496 | 2020 | 10220496 | 529337.0 | 176237.0 | -0.139253 | 51.470327 | 1 | 3 | 1 | 2 | 27/04/2020 | 2 | 13:55 | 9 | E09000022 | E09000022 | 3 | 3036 | 6 | 20 | 9 | 2 | 6 | 0 | 0 | 4 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 2 | E01003034 |
| 2020010228005 | 2020 | 10228005 | 526432.0 | 182761.0 | -0.178719 | 51.529614 | 1 | 3 | 1 | 1 | 01/01/2020 | 4 | 01:25 | 1 | E09000033 | E09000033 | 5 | 0 | 6 | 30 | 3 | 1 | 6 | 0 | 0 | 0 | 4 | 1 | 2 | 0 | 0 | 1 | 1 | 2 | E01004726 |
| 2020010228006 | 2020 | 10228006 | 538676.0 | 184371.0 | -0.001683 | 51.541210 | 1 | 2 | 1 | 1 | 01/01/2020 | 4 | 01:50 | 17 | E09000025 | E09000025 | 3 | 11 | 6 | 30 | 0 | -1 | 6 | 0 | 0 | 4 | 4 | 1 | 1 | 0 | 0 | 1 | 1 | 2 | E01003617 |
| 2020010228011 | 2020 | 10228011 | 529324.0 | 181286.0 | -0.137592 | 51.515704 | 1 | 3 | 1 | 2 | 01/01/2020 | 4 | 02:25 | 1 | E09000033 | E09000033 | 3 | 40 | 6 | 30 | 3 | 4 | 5 | 0 | 0 | 0 | 4 | 1 | 1 | 0 | 0 | 1 | 1 | 2 | E01004763 |
An important thing we can notice is the accident_index. This index allows cross referencing with the other datasets.
characteristics_ = ref.iloc[:1236]
characteristics_
| table | field name | code/format | label | note | |
|---|---|---|---|---|---|
| 0 | Accident | accident_index | NaN | NaN | unique value for each accident. The accident_i... |
| 1 | Accident | accident_year | NaN | NaN | NaN |
| 2 | Accident | accident_reference | NaN | NaN | In year id used by the police to reference a c... |
| 3 | Accident | location_easting_osgr | NaN | NaN | Null if not known |
| 4 | Accident | location_northing_osgr | NaN | NaN | Null if not known |
| 5 | Accident | longitude | NaN | NaN | Null if not known |
| 6 | Accident | Latitude | NaN | NaN | Null if not known |
| 7 | Accident | police_force | 1 | Metropolitan Police | NaN |
| 8 | Accident | police_force | 3 | Cumbria | NaN |
| 9 | Accident | police_force | 4 | Lancashire | NaN |
| 10 | Accident | police_force | 5 | Merseyside | NaN |
| 11 | Accident | police_force | 6 | Greater Manchester | NaN |
| 12 | Accident | police_force | 7 | Cheshire | NaN |
| 13 | Accident | police_force | 10 | Northumbria | NaN |
| 14 | Accident | police_force | 11 | Durham | NaN |
| 15 | Accident | police_force | 12 | North Yorkshire | NaN |
| 16 | Accident | police_force | 13 | West Yorkshire | NaN |
| 17 | Accident | police_force | 14 | South Yorkshire | NaN |
| 18 | Accident | police_force | 16 | Humberside | NaN |
| 19 | Accident | police_force | 17 | Cleveland | NaN |
| 20 | Accident | police_force | 20 | West Midlands | NaN |
| 21 | Accident | police_force | 21 | Staffordshire | NaN |
| 22 | Accident | police_force | 22 | West Mercia | NaN |
| 23 | Accident | police_force | 23 | Warwickshire | NaN |
| 24 | Accident | police_force | 30 | Derbyshire | NaN |
| 25 | Accident | police_force | 31 | Nottinghamshire | NaN |
| 26 | Accident | police_force | 32 | Lincolnshire | NaN |
| 27 | Accident | police_force | 33 | Leicestershire | NaN |
| 28 | Accident | police_force | 34 | Northamptonshire | NaN |
| 29 | Accident | police_force | 35 | Cambridgeshire | NaN |
| 30 | Accident | police_force | 36 | Norfolk | NaN |
| 31 | Accident | police_force | 37 | Suffolk | NaN |
| 32 | Accident | police_force | 40 | Bedfordshire | NaN |
| 33 | Accident | police_force | 41 | Hertfordshire | NaN |
| 34 | Accident | police_force | 42 | Essex | NaN |
| 35 | Accident | police_force | 43 | Thames Valley | NaN |
| 36 | Accident | police_force | 44 | Hampshire | NaN |
| 37 | Accident | police_force | 45 | Surrey | NaN |
| 38 | Accident | police_force | 46 | Kent | NaN |
| 39 | Accident | police_force | 47 | Sussex | NaN |
| 40 | Accident | police_force | 48 | City of London | NaN |
| 41 | Accident | police_force | 50 | Devon and Cornwall | NaN |
| 42 | Accident | police_force | 52 | Avon and Somerset | NaN |
| 43 | Accident | police_force | 53 | Gloucestershire | NaN |
| 44 | Accident | police_force | 54 | Wiltshire | NaN |
| 45 | Accident | police_force | 55 | Dorset | NaN |
| 46 | Accident | police_force | 60 | North Wales | NaN |
| 47 | Accident | police_force | 61 | Gwent | NaN |
| 48 | Accident | police_force | 62 | South Wales | NaN |
| 49 | Accident | police_force | 63 | Dyfed-Powys | NaN |
| 50 | Accident | police_force | 91 | Northern | category discontinued in 2019 |
| 51 | Accident | police_force | 92 | Grampian | category discontinued in 2019 |
| 52 | Accident | police_force | 93 | Tayside | category discontinued in 2019 |
| 53 | Accident | police_force | 94 | Fife | category discontinued in 2019 |
| 54 | Accident | police_force | 95 | Lothian and Borders | category discontinued in 2019 |
| 55 | Accident | police_force | 96 | Central | category discontinued in 2019 |
| 56 | Accident | police_force | 97 | Strathclyde | category discontinued in 2019 |
| 57 | Accident | police_force | 98 | Dumfries and Galloway | category discontinued in 2019 |
| 58 | Accident | police_force | 99 | Police Scotland | category introduced in 2019 |
| 59 | Accident | accident_severity | 1 | Fatal | NaN |
| 60 | Accident | accident_severity | 2 | Serious | NaN |
| 61 | Accident | accident_severity | 3 | Slight | NaN |
| 62 | Accident | number_of_vehicles | NaN | NaN | NaN |
| 63 | Accident | number_of_casualties | NaN | NaN | NaN |
| 64 | Accident | date | (DD/MM/YYYY) | NaN | NaN |
| 65 | Accident | day_of_week | 1 | Sunday | NaN |
| 66 | Accident | day_of_week | 2 | Monday | NaN |
| 67 | Accident | day_of_week | 3 | Tuesday | NaN |
| 68 | Accident | day_of_week | 4 | Wednesday | NaN |
| 69 | Accident | day_of_week | 5 | Thursday | NaN |
| 70 | Accident | day_of_week | 6 | Friday | NaN |
| 71 | Accident | day_of_week | 7 | Saturday | NaN |
| 72 | Accident | time | (HH:MM) | NaN | Null if not known |
| 73 | Accident | local_authority_district | 1 | Westminster | NaN |
| 74 | Accident | local_authority_district | 2 | Camden | NaN |
| 75 | Accident | local_authority_district | 3 | Islington | NaN |
| 76 | Accident | local_authority_district | 4 | Hackney | NaN |
| 77 | Accident | local_authority_district | 5 | Tower Hamlets | NaN |
| 78 | Accident | local_authority_district | 6 | Greenwich | NaN |
| 79 | Accident | local_authority_district | 7 | Lewisham | NaN |
| 80 | Accident | local_authority_district | 8 | Southwark | NaN |
| 81 | Accident | local_authority_district | 9 | Lambeth | NaN |
| 82 | Accident | local_authority_district | 10 | Wandsworth | NaN |
| 83 | Accident | local_authority_district | 11 | Hammersmith and Fulham | NaN |
| 84 | Accident | local_authority_district | 12 | Kensington and Chelsea | NaN |
| 85 | Accident | local_authority_district | 13 | Waltham Forest | NaN |
| 86 | Accident | local_authority_district | 14 | Redbridge | NaN |
| 87 | Accident | local_authority_district | 15 | Havering | NaN |
| 88 | Accident | local_authority_district | 16 | Barking and Dagenham | NaN |
| 89 | Accident | local_authority_district | 17 | Newham | NaN |
| 90 | Accident | local_authority_district | 18 | Bexley | NaN |
| 91 | Accident | local_authority_district | 19 | Bromley | NaN |
| 92 | Accident | local_authority_district | 20 | Croydon | NaN |
| 93 | Accident | local_authority_district | 21 | Sutton | NaN |
| 94 | Accident | local_authority_district | 22 | Merton | NaN |
| 95 | Accident | local_authority_district | 23 | Kingston upon Thames | NaN |
| 96 | Accident | local_authority_district | 24 | Richmond upon Thames | NaN |
| 97 | Accident | local_authority_district | 25 | Hounslow | NaN |
| 98 | Accident | local_authority_district | 26 | Hillingdon | NaN |
| 99 | Accident | local_authority_district | 27 | Ealing | NaN |
| 100 | Accident | local_authority_district | 28 | Brent | NaN |
| 101 | Accident | local_authority_district | 29 | Harrow | NaN |
| 102 | Accident | local_authority_district | 30 | Barnet | NaN |
| 103 | Accident | local_authority_district | 31 | Haringey | NaN |
| 104 | Accident | local_authority_district | 32 | Enfield | NaN |
| 105 | Accident | local_authority_district | 33 | Hertsmere | NaN |
| 106 | Accident | local_authority_district | 38 | Epsom and Ewell | NaN |
| 107 | Accident | local_authority_district | 40 | Spelthorne | NaN |
| 108 | Accident | local_authority_district | 57 | London Airport (Heathrow) | NaN |
| 109 | Accident | local_authority_district | 60 | Allerdale | NaN |
| 110 | Accident | local_authority_district | 61 | Barrow-in-Furness | NaN |
| 111 | Accident | local_authority_district | 62 | Carlisle | NaN |
| 112 | Accident | local_authority_district | 63 | Copeland | NaN |
| 113 | Accident | local_authority_district | 64 | Eden | NaN |
| 114 | Accident | local_authority_district | 65 | South Lakeland | NaN |
| 115 | Accident | local_authority_district | 70 | Blackburn with Darwen | NaN |
| 116 | Accident | local_authority_district | 71 | Blackpool | NaN |
| 117 | Accident | local_authority_district | 72 | Burnley | NaN |
| 118 | Accident | local_authority_district | 73 | Chorley | NaN |
| 119 | Accident | local_authority_district | 74 | Fylde | NaN |
| 120 | Accident | local_authority_district | 75 | Hyndburn | NaN |
| 121 | Accident | local_authority_district | 76 | Lancaster | NaN |
| 122 | Accident | local_authority_district | 77 | Pendle | NaN |
| 123 | Accident | local_authority_district | 79 | Preston | NaN |
| 124 | Accident | local_authority_district | 80 | Ribble Valley | NaN |
| 125 | Accident | local_authority_district | 82 | Rossendale | NaN |
| 126 | Accident | local_authority_district | 83 | South Ribble | NaN |
| 127 | Accident | local_authority_district | 84 | West Lancashire | NaN |
| 128 | Accident | local_authority_district | 85 | Wyre | NaN |
| 129 | Accident | local_authority_district | 90 | Knowsley | NaN |
| 130 | Accident | local_authority_district | 91 | Liverpool | NaN |
| 131 | Accident | local_authority_district | 92 | St. Helens | NaN |
| 132 | Accident | local_authority_district | 93 | Sefton | NaN |
| 133 | Accident | local_authority_district | 95 | Wirral | NaN |
| 134 | Accident | local_authority_district | 100 | Bolton | NaN |
| 135 | Accident | local_authority_district | 101 | Bury | NaN |
| 136 | Accident | local_authority_district | 102 | Manchester | NaN |
| 137 | Accident | local_authority_district | 104 | Oldham | NaN |
| 138 | Accident | local_authority_district | 106 | Rochdale | NaN |
| 139 | Accident | local_authority_district | 107 | Salford | NaN |
| 140 | Accident | local_authority_district | 109 | Stockport | NaN |
| 141 | Accident | local_authority_district | 110 | Tameside | NaN |
| 142 | Accident | local_authority_district | 112 | Trafford | NaN |
| 143 | Accident | local_authority_district | 114 | Wigan | NaN |
| 144 | Accident | local_authority_district | 120 | Chester | NaN |
| 145 | Accident | local_authority_district | 121 | Congleton | NaN |
| 146 | Accident | local_authority_district | 122 | Crewe and Nantwich | NaN |
| 147 | Accident | local_authority_district | 123 | Ellesmere Port and Neston | NaN |
| 148 | Accident | local_authority_district | 124 | Halton | NaN |
| 149 | Accident | local_authority_district | 126 | Macclesfield | NaN |
| 150 | Accident | local_authority_district | 127 | Vale Royal | NaN |
| 151 | Accident | local_authority_district | 128 | Warrington | NaN |
| 152 | Accident | local_authority_district | 129 | Cheshire East | NaN |
| 153 | Accident | local_authority_district | 130 | Cheshire West and Chester | NaN |
| 154 | Accident | local_authority_district | 139 | Northumberland | NaN |
| 155 | Accident | local_authority_district | 140 | Alnwick | NaN |
| 156 | Accident | local_authority_district | 141 | Berwick-upon-Tweed | NaN |
| 157 | Accident | local_authority_district | 142 | Blyth Valley | NaN |
| 158 | Accident | local_authority_district | 143 | Castle Morpeth | NaN |
| 159 | Accident | local_authority_district | 144 | Tynedale | NaN |
| 160 | Accident | local_authority_district | 145 | Wansbeck | NaN |
| 161 | Accident | local_authority_district | 146 | Gateshead | NaN |
| 162 | Accident | local_authority_district | 147 | Newcastle upon Tyne | NaN |
| 163 | Accident | local_authority_district | 148 | North Tyneside | NaN |
| 164 | Accident | local_authority_district | 149 | South Tyneside | NaN |
| 165 | Accident | local_authority_district | 150 | Sunderland | NaN |
| 166 | Accident | local_authority_district | 160 | Chester-le-Street | NaN |
| 167 | Accident | local_authority_district | 161 | Darlington | NaN |
| 168 | Accident | local_authority_district | 162 | Derwentside | NaN |
| 169 | Accident | local_authority_district | 163 | Durham | NaN |
| 170 | Accident | local_authority_district | 164 | Easington | NaN |
| 171 | Accident | local_authority_district | 165 | Sedgefield | NaN |
| 172 | Accident | local_authority_district | 166 | Teesdale | NaN |
| 173 | Accident | local_authority_district | 168 | Wear Valley | NaN |
| 174 | Accident | local_authority_district | 169 | County Durham | NaN |
| 175 | Accident | local_authority_district | 180 | Craven | NaN |
| 176 | Accident | local_authority_district | 181 | Hambleton | NaN |
| 177 | Accident | local_authority_district | 182 | Harrogate | NaN |
| 178 | Accident | local_authority_district | 184 | Richmondshire | NaN |
| 179 | Accident | local_authority_district | 185 | Ryedale | NaN |
| 180 | Accident | local_authority_district | 186 | Scarborough | NaN |
| 181 | Accident | local_authority_district | 187 | Selby | NaN |
| 182 | Accident | local_authority_district | 189 | York | NaN |
| 183 | Accident | local_authority_district | 200 | Bradford | NaN |
| 184 | Accident | local_authority_district | 202 | Calderdale | NaN |
| 185 | Accident | local_authority_district | 203 | Kirklees | NaN |
| 186 | Accident | local_authority_district | 204 | Leeds | NaN |
| 187 | Accident | local_authority_district | 206 | Wakefield | NaN |
| 188 | Accident | local_authority_district | 210 | Barnsley | NaN |
| 189 | Accident | local_authority_district | 211 | Doncaster | NaN |
| 190 | Accident | local_authority_district | 213 | Rotherham | NaN |
| 191 | Accident | local_authority_district | 215 | Sheffield | NaN |
| 192 | Accident | local_authority_district | 228 | Kingston upon Hull, City of | NaN |
| 193 | Accident | local_authority_district | 231 | East Riding of Yorkshire | NaN |
| 194 | Accident | local_authority_district | 232 | North Lincolnshire | NaN |
| 195 | Accident | local_authority_district | 233 | North East Lincolnshire | NaN |
| 196 | Accident | local_authority_district | 240 | Hartlepool | NaN |
| 197 | Accident | local_authority_district | 241 | Redcar and Cleveland | NaN |
| 198 | Accident | local_authority_district | 243 | Middlesbrough | NaN |
| 199 | Accident | local_authority_district | 245 | Stockton-on-Tees | NaN |
| 200 | Accident | local_authority_district | 250 | Cannock Chase | NaN |
| 201 | Accident | local_authority_district | 251 | East Staffordshire | NaN |
| 202 | Accident | local_authority_district | 252 | Lichfield | NaN |
| 203 | Accident | local_authority_district | 253 | Newcastle-under-Lyme | NaN |
| 204 | Accident | local_authority_district | 254 | South Staffordshire | NaN |
| 205 | Accident | local_authority_district | 255 | Stafford | NaN |
| 206 | Accident | local_authority_district | 256 | Staffordshire Moorlands | NaN |
| 207 | Accident | local_authority_district | 257 | Stoke-on-Trent | NaN |
| 208 | Accident | local_authority_district | 258 | Tamworth | NaN |
| 209 | Accident | local_authority_district | 270 | Bromsgrove | NaN |
| 210 | Accident | local_authority_district | 273 | Malvern Hills | NaN |
| 211 | Accident | local_authority_district | 274 | Redditch | NaN |
| 212 | Accident | local_authority_district | 276 | Worcester | NaN |
| 213 | Accident | local_authority_district | 277 | Wychavon | NaN |
| 214 | Accident | local_authority_district | 278 | Wyre Forest | NaN |
| 215 | Accident | local_authority_district | 279 | Bridgnorth | NaN |
| 216 | Accident | local_authority_district | 280 | North Shropshire | NaN |
| 217 | Accident | local_authority_district | 281 | Oswestry | NaN |
| 218 | Accident | local_authority_district | 282 | Shrewsbury and Atcham | NaN |
| 219 | Accident | local_authority_district | 283 | South Shropshire | NaN |
| 220 | Accident | local_authority_district | 284 | Telford and Wrekin | NaN |
| 221 | Accident | local_authority_district | 285 | Herefordshire, County of | NaN |
| 222 | Accident | local_authority_district | 286 | Shropshire | NaN |
| 223 | Accident | local_authority_district | 290 | North Warwickshire | NaN |
| 224 | Accident | local_authority_district | 291 | Nuneaton and Bedworth | NaN |
| 225 | Accident | local_authority_district | 292 | Rugby | NaN |
| 226 | Accident | local_authority_district | 293 | Stratford-upon-Avon | NaN |
| 227 | Accident | local_authority_district | 294 | Warwick | NaN |
| 228 | Accident | local_authority_district | 300 | Birmingham | NaN |
| 229 | Accident | local_authority_district | 302 | Coventry | NaN |
| 230 | Accident | local_authority_district | 303 | Dudley | NaN |
| 231 | Accident | local_authority_district | 305 | Sandwell | NaN |
| 232 | Accident | local_authority_district | 306 | Solihull | NaN |
| 233 | Accident | local_authority_district | 307 | Walsall | NaN |
| 234 | Accident | local_authority_district | 309 | Wolverhampton | NaN |
| 235 | Accident | local_authority_district | 320 | Amber Valley | NaN |
| 236 | Accident | local_authority_district | 321 | Bolsover | NaN |
| 237 | Accident | local_authority_district | 322 | Chesterfield | NaN |
| 238 | Accident | local_authority_district | 323 | Derby | NaN |
| 239 | Accident | local_authority_district | 324 | Erewash | NaN |
| 240 | Accident | local_authority_district | 325 | High Peak | NaN |
| 241 | Accident | local_authority_district | 327 | North East Derbyshire | NaN |
| 242 | Accident | local_authority_district | 328 | South Derbyshire | NaN |
| 243 | Accident | local_authority_district | 329 | Derbyshire Dales | NaN |
| 244 | Accident | local_authority_district | 340 | Ashfield | NaN |
| 245 | Accident | local_authority_district | 341 | Bassetlaw | NaN |
| 246 | Accident | local_authority_district | 342 | Broxtowe | NaN |
| 247 | Accident | local_authority_district | 343 | Gedling | NaN |
| 248 | Accident | local_authority_district | 344 | Mansfield | NaN |
| 249 | Accident | local_authority_district | 345 | Newark and Sherwood | NaN |
| 250 | Accident | local_authority_district | 346 | Nottingham | NaN |
| 251 | Accident | local_authority_district | 347 | Rushcliffe | NaN |
| 252 | Accident | local_authority_district | 350 | Boston | NaN |
| 253 | Accident | local_authority_district | 351 | East Lindsey | NaN |
| 254 | Accident | local_authority_district | 352 | Lincoln | NaN |
| 255 | Accident | local_authority_district | 353 | North Kesteven | NaN |
| 256 | Accident | local_authority_district | 354 | South Holland | NaN |
| 257 | Accident | local_authority_district | 355 | South Kesteven | NaN |
| 258 | Accident | local_authority_district | 356 | West Lindsey | NaN |
| 259 | Accident | local_authority_district | 360 | Blaby | NaN |
| 260 | Accident | local_authority_district | 361 | Hinckley and Bosworth | NaN |
| 261 | Accident | local_authority_district | 362 | Charnwood | NaN |
| 262 | Accident | local_authority_district | 363 | Harborough | NaN |
| 263 | Accident | local_authority_district | 364 | Leicester | NaN |
| 264 | Accident | local_authority_district | 365 | Melton | NaN |
| 265 | Accident | local_authority_district | 366 | North West Leicestershire | NaN |
| 266 | Accident | local_authority_district | 367 | Oadby and Wigston | NaN |
| 267 | Accident | local_authority_district | 368 | Rutland | NaN |
| 268 | Accident | local_authority_district | 380 | Corby | NaN |
| 269 | Accident | local_authority_district | 381 | Daventry | NaN |
| 270 | Accident | local_authority_district | 382 | East Northamptonshire | NaN |
| 271 | Accident | local_authority_district | 383 | Kettering | NaN |
| 272 | Accident | local_authority_district | 384 | Northampton | NaN |
| 273 | Accident | local_authority_district | 385 | South Northamptonshire | NaN |
| 274 | Accident | local_authority_district | 386 | Wellingborough | NaN |
| 275 | Accident | local_authority_district | 390 | Cambridge | NaN |
| 276 | Accident | local_authority_district | 391 | East Cambridgeshire | NaN |
| 277 | Accident | local_authority_district | 392 | Fenland | NaN |
| 278 | Accident | local_authority_district | 393 | Huntingdonshire | NaN |
| 279 | Accident | local_authority_district | 394 | Peterborough | NaN |
| 280 | Accident | local_authority_district | 395 | South Cambridgeshire | NaN |
| 281 | Accident | local_authority_district | 400 | Breckland | NaN |
| 282 | Accident | local_authority_district | 401 | Broadland | NaN |
| 283 | Accident | local_authority_district | 402 | Great Yarmouth | NaN |
| 284 | Accident | local_authority_district | 404 | Norwich | NaN |
| 285 | Accident | local_authority_district | 405 | North Norfolk | NaN |
| 286 | Accident | local_authority_district | 406 | South Norfolk | NaN |
| 287 | Accident | local_authority_district | 407 | King's Lynn and West Norfolk | NaN |
| 288 | Accident | local_authority_district | 410 | Babergh | NaN |
| 289 | Accident | local_authority_district | 411 | Forest Heath | NaN |
| 290 | Accident | local_authority_district | 412 | Ipswich | NaN |
| 291 | Accident | local_authority_district | 413 | Mid Suffolk | NaN |
| 292 | Accident | local_authority_district | 414 | St. Edmundsbury | NaN |
| 293 | Accident | local_authority_district | 415 | Suffolk Coastal | NaN |
| 294 | Accident | local_authority_district | 416 | Waveney | NaN |
| 295 | Accident | local_authority_district | 420 | Bedford | NaN |
| 296 | Accident | local_authority_district | 421 | Luton | NaN |
| 297 | Accident | local_authority_district | 422 | Mid Bedfordshire | NaN |
| 298 | Accident | local_authority_district | 423 | South Bedfordshire | NaN |
| 299 | Accident | local_authority_district | 424 | Central Bedfordshire | NaN |
| 300 | Accident | local_authority_district | 430 | Broxbourne | NaN |
| 301 | Accident | local_authority_district | 431 | Dacorum | NaN |
| 302 | Accident | local_authority_district | 432 | East Hertfordshire | NaN |
| 303 | Accident | local_authority_district | 433 | North Hertfordshire | NaN |
| 304 | Accident | local_authority_district | 434 | St. Albans | NaN |
| 305 | Accident | local_authority_district | 435 | Stevenage | NaN |
| 306 | Accident | local_authority_district | 436 | Three Rivers | NaN |
| 307 | Accident | local_authority_district | 437 | Watford | NaN |
| 308 | Accident | local_authority_district | 438 | Welwyn Hatfield | NaN |
| 309 | Accident | local_authority_district | 450 | Basildon | NaN |
| 310 | Accident | local_authority_district | 451 | Braintree | NaN |
| 311 | Accident | local_authority_district | 452 | Brentwood | NaN |
| 312 | Accident | local_authority_district | 453 | Castle Point | NaN |
| 313 | Accident | local_authority_district | 454 | Chelmsford | NaN |
| 314 | Accident | local_authority_district | 455 | Colchester | NaN |
| 315 | Accident | local_authority_district | 456 | Epping Forest | NaN |
| 316 | Accident | local_authority_district | 457 | Harlow | NaN |
| 317 | Accident | local_authority_district | 458 | Maldon | NaN |
| 318 | Accident | local_authority_district | 459 | Rochford | NaN |
| 319 | Accident | local_authority_district | 460 | Southend-on-Sea | NaN |
| 320 | Accident | local_authority_district | 461 | Tendring | NaN |
| 321 | Accident | local_authority_district | 462 | Thurrock | NaN |
| 322 | Accident | local_authority_district | 463 | Uttlesford | NaN |
| 323 | Accident | local_authority_district | 470 | Bracknell Forest | NaN |
| 324 | Accident | local_authority_district | 471 | West Berkshire | NaN |
| 325 | Accident | local_authority_district | 472 | Reading | NaN |
| 326 | Accident | local_authority_district | 473 | Slough | NaN |
| 327 | Accident | local_authority_district | 474 | Windsor and Maidenhead | NaN |
| 328 | Accident | local_authority_district | 475 | Wokingham | NaN |
| 329 | Accident | local_authority_district | 476 | Aylesbury Vale | NaN |
| 330 | Accident | local_authority_district | 477 | South Bucks | NaN |
| 331 | Accident | local_authority_district | 478 | Chiltern | NaN |
| 332 | Accident | local_authority_district | 479 | Milton Keynes | NaN |
| 333 | Accident | local_authority_district | 480 | Wycombe | NaN |
| 334 | Accident | local_authority_district | 481 | Cherwell | NaN |
| 335 | Accident | local_authority_district | 482 | Oxford | NaN |
| 336 | Accident | local_authority_district | 483 | Vale of White Horse | NaN |
| 337 | Accident | local_authority_district | 484 | South Oxfordshire | NaN |
| 338 | Accident | local_authority_district | 485 | West Oxfordshire | NaN |
| 339 | Accident | local_authority_district | 490 | Basingstoke and Deane | NaN |
| 340 | Accident | local_authority_district | 491 | Eastleigh | NaN |
| 341 | Accident | local_authority_district | 492 | Fareham | NaN |
| 342 | Accident | local_authority_district | 493 | Gosport | NaN |
| 343 | Accident | local_authority_district | 494 | Hart | NaN |
| 344 | Accident | local_authority_district | 495 | Havant | NaN |
| 345 | Accident | local_authority_district | 496 | New Forest | NaN |
| 346 | Accident | local_authority_district | 497 | East Hampshire | NaN |
| 347 | Accident | local_authority_district | 498 | Portsmouth | NaN |
| 348 | Accident | local_authority_district | 499 | Rushmoor | NaN |
| 349 | Accident | local_authority_district | 500 | Southampton | NaN |
| 350 | Accident | local_authority_district | 501 | Test Valley | NaN |
| 351 | Accident | local_authority_district | 502 | Winchester | NaN |
| 352 | Accident | local_authority_district | 505 | Isle of Wight | NaN |
| 353 | Accident | local_authority_district | 510 | Elmbridge | NaN |
| 354 | Accident | local_authority_district | 511 | Guildford | NaN |
| 355 | Accident | local_authority_district | 512 | Mole Valley | NaN |
| 356 | Accident | local_authority_district | 513 | Reigate and Banstead | NaN |
| 357 | Accident | local_authority_district | 514 | Runnymede | NaN |
| 358 | Accident | local_authority_district | 515 | Surrey Heath | NaN |
| 359 | Accident | local_authority_district | 516 | Tandridge | NaN |
| 360 | Accident | local_authority_district | 517 | Waverley | NaN |
| 361 | Accident | local_authority_district | 518 | Woking | NaN |
| 362 | Accident | local_authority_district | 530 | Ashford | NaN |
| 363 | Accident | local_authority_district | 531 | Canterbury | NaN |
| 364 | Accident | local_authority_district | 532 | Dartford | NaN |
| 365 | Accident | local_authority_district | 533 | Dover | NaN |
| 366 | Accident | local_authority_district | 535 | Gravesham | NaN |
| 367 | Accident | local_authority_district | 536 | Maidstone | NaN |
| 368 | Accident | local_authority_district | 538 | Sevenoaks | NaN |
| 369 | Accident | local_authority_district | 539 | Shepway | NaN |
| 370 | Accident | local_authority_district | 540 | Swale | NaN |
| 371 | Accident | local_authority_district | 541 | Thanet | NaN |
| 372 | Accident | local_authority_district | 542 | Tonbridge and Malling | NaN |
| 373 | Accident | local_authority_district | 543 | Tunbridge Wells | NaN |
| 374 | Accident | local_authority_district | 544 | Medway | NaN |
| 375 | Accident | local_authority_district | 551 | Eastbourne | NaN |
| 376 | Accident | local_authority_district | 552 | Hastings | NaN |
| 377 | Accident | local_authority_district | 554 | Lewes | NaN |
| 378 | Accident | local_authority_district | 555 | Rother | NaN |
| 379 | Accident | local_authority_district | 556 | Wealden | NaN |
| 380 | Accident | local_authority_district | 557 | Adur | NaN |
| 381 | Accident | local_authority_district | 558 | Arun | NaN |
| 382 | Accident | local_authority_district | 559 | Chichester | NaN |
| 383 | Accident | local_authority_district | 560 | Crawley | NaN |
| 384 | Accident | local_authority_district | 562 | Horsham | NaN |
| 385 | Accident | local_authority_district | 563 | Mid Sussex | NaN |
| 386 | Accident | local_authority_district | 564 | Worthing | NaN |
| 387 | Accident | local_authority_district | 565 | Brighton and Hove | NaN |
| 388 | Accident | local_authority_district | 570 | City of London | NaN |
| 389 | Accident | local_authority_district | 580 | East Devon | NaN |
| 390 | Accident | local_authority_district | 581 | Exeter | NaN |
| 391 | Accident | local_authority_district | 582 | North Devon | NaN |
| 392 | Accident | local_authority_district | 583 | Plymouth | NaN |
| 393 | Accident | local_authority_district | 584 | South Hams | NaN |
| 394 | Accident | local_authority_district | 585 | Teignbridge | NaN |
| 395 | Accident | local_authority_district | 586 | Mid Devon | NaN |
| 396 | Accident | local_authority_district | 587 | Torbay | NaN |
| 397 | Accident | local_authority_district | 588 | Torridge | NaN |
| 398 | Accident | local_authority_district | 589 | West Devon | NaN |
| 399 | Accident | local_authority_district | 590 | Caradon | NaN |
| 400 | Accident | local_authority_district | 591 | Carrick | NaN |
| 401 | Accident | local_authority_district | 592 | Kerrier | NaN |
| 402 | Accident | local_authority_district | 593 | North Cornwall | NaN |
| 403 | Accident | local_authority_district | 594 | Penwith | NaN |
| 404 | Accident | local_authority_district | 595 | Restormel | NaN |
| 405 | Accident | local_authority_district | 596 | Cornwall | NaN |
| 406 | Accident | local_authority_district | 601 | Bristol, City of | NaN |
| 407 | Accident | local_authority_district | 605 | North Somerset | NaN |
| 408 | Accident | local_authority_district | 606 | Mendip | NaN |
| 409 | Accident | local_authority_district | 607 | Sedgemoor | NaN |
| 410 | Accident | local_authority_district | 608 | Taunton Deane | NaN |
| 411 | Accident | local_authority_district | 609 | West Somerset | NaN |
| 412 | Accident | local_authority_district | 610 | South Somerset | NaN |
| 413 | Accident | local_authority_district | 611 | Bath and North East Somerset | NaN |
| 414 | Accident | local_authority_district | 612 | South Gloucestershire | NaN |
| 415 | Accident | local_authority_district | 620 | Cheltenham | NaN |
| 416 | Accident | local_authority_district | 621 | Cotswold | NaN |
| 417 | Accident | local_authority_district | 622 | Forest of Dean | NaN |
| 418 | Accident | local_authority_district | 623 | Gloucester | NaN |
| 419 | Accident | local_authority_district | 624 | Stroud | NaN |
| 420 | Accident | local_authority_district | 625 | Tewkesbury | NaN |
| 421 | Accident | local_authority_district | 630 | Kennet | NaN |
| 422 | Accident | local_authority_district | 631 | North Wiltshire | NaN |
| 423 | Accident | local_authority_district | 632 | Salisbury | NaN |
| 424 | Accident | local_authority_district | 633 | Swindon | NaN |
| 425 | Accident | local_authority_district | 634 | West Wiltshire | NaN |
| 426 | Accident | local_authority_district | 635 | Wiltshire | NaN |
| 427 | Accident | local_authority_district | 640 | Bournemouth | NaN |
| 428 | Accident | local_authority_district | 641 | Christchurch | NaN |
| 429 | Accident | local_authority_district | 642 | North Dorset | NaN |
| 430 | Accident | local_authority_district | 643 | Poole | NaN |
| 431 | Accident | local_authority_district | 644 | Purbeck | NaN |
| 432 | Accident | local_authority_district | 645 | West Dorset | NaN |
| 433 | Accident | local_authority_district | 646 | Weymouth and Portland | NaN |
| 434 | Accident | local_authority_district | 647 | East Dorset | NaN |
| 435 | Accident | local_authority_district | 720 | Isle of Anglesey | NaN |
| 436 | Accident | local_authority_district | 721 | Conwy | NaN |
| 437 | Accident | local_authority_district | 722 | Gwynedd | NaN |
| 438 | Accident | local_authority_district | 723 | Denbighshire | NaN |
| 439 | Accident | local_authority_district | 724 | Flintshire | NaN |
| 440 | Accident | local_authority_district | 725 | Wrexham | NaN |
| 441 | Accident | local_authority_district | 730 | Blaenau Gwent | NaN |
| 442 | Accident | local_authority_district | 731 | Caerphilly | NaN |
| 443 | Accident | local_authority_district | 732 | Monmouthshire | NaN |
| 444 | Accident | local_authority_district | 733 | Newport | NaN |
| 445 | Accident | local_authority_district | 734 | Torfaen | NaN |
| 446 | Accident | local_authority_district | 740 | Bridgend | NaN |
| 447 | Accident | local_authority_district | 741 | Cardiff | NaN |
| 448 | Accident | local_authority_district | 742 | Merthyr Tydfil | NaN |
| 449 | Accident | local_authority_district | 743 | Neath Port Talbot | NaN |
| 450 | Accident | local_authority_district | 744 | Rhondda, Cynon, Taff | NaN |
| 451 | Accident | local_authority_district | 745 | Swansea | NaN |
| 452 | Accident | local_authority_district | 746 | The Vale of Glamorgan | NaN |
| 453 | Accident | local_authority_district | 750 | Ceredigion | NaN |
| 454 | Accident | local_authority_district | 751 | Carmarthenshire | NaN |
| 455 | Accident | local_authority_district | 752 | Pembrokeshire | NaN |
| 456 | Accident | local_authority_district | 753 | Powys | NaN |
| 457 | Accident | local_authority_district | 910 | Aberdeen City | NaN |
| 458 | Accident | local_authority_district | 911 | Aberdeenshire | NaN |
| 459 | Accident | local_authority_district | 912 | Angus | NaN |
| 460 | Accident | local_authority_district | 913 | Argyll and Bute | NaN |
| 461 | Accident | local_authority_district | 914 | Scottish Borders | NaN |
| 462 | Accident | local_authority_district | 915 | Clackmannanshire | NaN |
| 463 | Accident | local_authority_district | 916 | West Dunbartonshire | NaN |
| 464 | Accident | local_authority_district | 917 | Dumfries and Galloway | NaN |
| 465 | Accident | local_authority_district | 918 | Dundee City | NaN |
| 466 | Accident | local_authority_district | 919 | East Ayrshire | NaN |
| 467 | Accident | local_authority_district | 920 | East Dunbartonshire | NaN |
| 468 | Accident | local_authority_district | 921 | East Lothian | NaN |
| 469 | Accident | local_authority_district | 922 | East Renfrewshire | NaN |
| 470 | Accident | local_authority_district | 923 | Edinburgh, City of | NaN |
| 471 | Accident | local_authority_district | 924 | Falkirk | NaN |
| 472 | Accident | local_authority_district | 925 | Fife | NaN |
| 473 | Accident | local_authority_district | 926 | Glasgow City | NaN |
| 474 | Accident | local_authority_district | 927 | Highland | NaN |
| 475 | Accident | local_authority_district | 928 | Inverclyde | NaN |
| 476 | Accident | local_authority_district | 929 | Midlothian | NaN |
| 477 | Accident | local_authority_district | 930 | Moray | NaN |
| 478 | Accident | local_authority_district | 931 | North Ayrshire | NaN |
| 479 | Accident | local_authority_district | 932 | North Lanarkshire | NaN |
| 480 | Accident | local_authority_district | 933 | Orkney Islands | NaN |
| 481 | Accident | local_authority_district | 934 | Perth and Kinross | NaN |
| 482 | Accident | local_authority_district | 935 | Renfrewshire | NaN |
| 483 | Accident | local_authority_district | 936 | Shetland Islands | NaN |
| 484 | Accident | local_authority_district | 937 | South Ayrshire | NaN |
| 485 | Accident | local_authority_district | 938 | South Lanarkshire | NaN |
| 486 | Accident | local_authority_district | 939 | Stirling | NaN |
| 487 | Accident | local_authority_district | 940 | West Lothian | NaN |
| 488 | Accident | local_authority_district | 941 | Western Isles | NaN |
| 489 | Accident | local_authority_ons_district | E06000001 | Hartlepool | NaN |
| 490 | Accident | local_authority_ons_district | E06000002 | Middlesbrough | NaN |
| 491 | Accident | local_authority_ons_district | E06000003 | Redcar and Cleveland | NaN |
| 492 | Accident | local_authority_ons_district | E06000004 | Stockton-on-Tees | NaN |
| 493 | Accident | local_authority_ons_district | E06000005 | Darlington | NaN |
| 494 | Accident | local_authority_ons_district | E06000006 | Halton | NaN |
| 495 | Accident | local_authority_ons_district | E06000007 | Warrington | NaN |
| 496 | Accident | local_authority_ons_district | E06000008 | Blackburn with Darwen | NaN |
| 497 | Accident | local_authority_ons_district | E06000009 | Blackpool | NaN |
| 498 | Accident | local_authority_ons_district | E06000010 | Kingston upon Hull, City of | NaN |
| 499 | Accident | local_authority_ons_district | E06000011 | East Riding of Yorkshire | NaN |
| 500 | Accident | local_authority_ons_district | E06000012 | North East Lincolnshire | NaN |
| 501 | Accident | local_authority_ons_district | E06000013 | North Lincolnshire | NaN |
| 502 | Accident | local_authority_ons_district | E06000014 | York | NaN |
| 503 | Accident | local_authority_ons_district | E06000015 | Derby | NaN |
| 504 | Accident | local_authority_ons_district | E06000016 | Leicester | NaN |
| 505 | Accident | local_authority_ons_district | E06000017 | Rutland | NaN |
| 506 | Accident | local_authority_ons_district | E06000018 | Nottingham | NaN |
| 507 | Accident | local_authority_ons_district | E06000019 | Herefordshire, County of | NaN |
| 508 | Accident | local_authority_ons_district | E06000020 | Telford and Wrekin | NaN |
| 509 | Accident | local_authority_ons_district | E06000021 | Stoke-on-Trent | NaN |
| 510 | Accident | local_authority_ons_district | E06000022 | Bath and North East Somerset | NaN |
| 511 | Accident | local_authority_ons_district | E06000023 | Bristol, City of | NaN |
| 512 | Accident | local_authority_ons_district | E06000024 | North Somerset | NaN |
| 513 | Accident | local_authority_ons_district | E06000025 | South Gloucestershire | NaN |
| 514 | Accident | local_authority_ons_district | E06000026 | Plymouth | NaN |
| 515 | Accident | local_authority_ons_district | E06000027 | Torbay | NaN |
| 516 | Accident | local_authority_ons_district | E06000028 | Bournemouth | NaN |
| 517 | Accident | local_authority_ons_district | E06000029 | Poole | NaN |
| 518 | Accident | local_authority_ons_district | E06000030 | Swindon | NaN |
| 519 | Accident | local_authority_ons_district | E06000031 | Peterborough | NaN |
| 520 | Accident | local_authority_ons_district | E06000032 | Luton | NaN |
| 521 | Accident | local_authority_ons_district | E06000033 | Southend-on-Sea | NaN |
| 522 | Accident | local_authority_ons_district | E06000034 | Thurrock | NaN |
| 523 | Accident | local_authority_ons_district | E06000035 | Medway | NaN |
| 524 | Accident | local_authority_ons_district | E06000036 | Bracknell Forest | NaN |
| 525 | Accident | local_authority_ons_district | E06000037 | West Berkshire | NaN |
| 526 | Accident | local_authority_ons_district | E06000038 | Reading | NaN |
| 527 | Accident | local_authority_ons_district | E06000039 | Slough | NaN |
| 528 | Accident | local_authority_ons_district | E06000040 | Windsor and Maidenhead | NaN |
| 529 | Accident | local_authority_ons_district | E06000041 | Wokingham | NaN |
| 530 | Accident | local_authority_ons_district | E06000042 | Milton Keynes | NaN |
| 531 | Accident | local_authority_ons_district | E06000043 | Brighton and Hove | NaN |
| 532 | Accident | local_authority_ons_district | E06000044 | Portsmouth | NaN |
| 533 | Accident | local_authority_ons_district | E06000045 | Southampton | NaN |
| 534 | Accident | local_authority_ons_district | E06000046 | Isle of Wight | NaN |
| 535 | Accident | local_authority_ons_district | E06000047 | County Durham | NaN |
| 536 | Accident | local_authority_ons_district | E06000048 | Northumberland | NaN |
| 537 | Accident | local_authority_ons_district | E06000049 | Cheshire East | NaN |
| 538 | Accident | local_authority_ons_district | E06000050 | Cheshire West and Chester | NaN |
| 539 | Accident | local_authority_ons_district | E06000051 | Shropshire | NaN |
| 540 | Accident | local_authority_ons_district | E06000052 | Cornwall | NaN |
| 541 | Accident | local_authority_ons_district | E06000053 | Isles of Scilly | NaN |
| 542 | Accident | local_authority_ons_district | E06000054 | Wiltshire | NaN |
| 543 | Accident | local_authority_ons_district | E06000055 | Bedford | NaN |
| 544 | Accident | local_authority_ons_district | E06000056 | Central Bedfordshire | NaN |
| 545 | Accident | local_authority_ons_district | E07000004 | Aylesbury Vale | NaN |
| 546 | Accident | local_authority_ons_district | E07000005 | Chiltern | NaN |
| 547 | Accident | local_authority_ons_district | E07000006 | South Bucks | NaN |
| 548 | Accident | local_authority_ons_district | E07000007 | Wycombe | NaN |
| 549 | Accident | local_authority_ons_district | E07000008 | Cambridge | NaN |
| 550 | Accident | local_authority_ons_district | E07000009 | East Cambridgeshire | NaN |
| 551 | Accident | local_authority_ons_district | E07000010 | Fenland | NaN |
| 552 | Accident | local_authority_ons_district | E07000011 | Huntingdonshire | NaN |
| 553 | Accident | local_authority_ons_district | E07000012 | South Cambridgeshire | NaN |
| 554 | Accident | local_authority_ons_district | E07000026 | Allerdale | NaN |
| 555 | Accident | local_authority_ons_district | E07000027 | Barrow-in-Furness | NaN |
| 556 | Accident | local_authority_ons_district | E07000028 | Carlisle | NaN |
| 557 | Accident | local_authority_ons_district | E07000029 | Copeland | NaN |
| 558 | Accident | local_authority_ons_district | E07000030 | Eden | NaN |
| 559 | Accident | local_authority_ons_district | E07000031 | South Lakeland | NaN |
| 560 | Accident | local_authority_ons_district | E07000032 | Amber Valley | NaN |
| 561 | Accident | local_authority_ons_district | E07000033 | Bolsover | NaN |
| 562 | Accident | local_authority_ons_district | E07000034 | Chesterfield | NaN |
| 563 | Accident | local_authority_ons_district | E07000035 | Derbyshire Dales | NaN |
| 564 | Accident | local_authority_ons_district | E07000036 | Erewash | NaN |
| 565 | Accident | local_authority_ons_district | E07000037 | High Peak | NaN |
| 566 | Accident | local_authority_ons_district | E07000038 | North East Derbyshire | NaN |
| 567 | Accident | local_authority_ons_district | E07000039 | South Derbyshire | NaN |
| 568 | Accident | local_authority_ons_district | E07000040 | East Devon | NaN |
| 569 | Accident | local_authority_ons_district | E07000041 | Exeter | NaN |
| 570 | Accident | local_authority_ons_district | E07000042 | Mid Devon | NaN |
| 571 | Accident | local_authority_ons_district | E07000043 | North Devon | NaN |
| 572 | Accident | local_authority_ons_district | E07000044 | South Hams | NaN |
| 573 | Accident | local_authority_ons_district | E07000045 | Teignbridge | NaN |
| 574 | Accident | local_authority_ons_district | E07000046 | Torridge | NaN |
| 575 | Accident | local_authority_ons_district | E07000047 | West Devon | NaN |
| 576 | Accident | local_authority_ons_district | E07000048 | Christchurch | NaN |
| 577 | Accident | local_authority_ons_district | E07000049 | East Dorset | NaN |
| 578 | Accident | local_authority_ons_district | E07000050 | North Dorset | NaN |
| 579 | Accident | local_authority_ons_district | E07000051 | Purbeck | NaN |
| 580 | Accident | local_authority_ons_district | E07000052 | West Dorset | NaN |
| 581 | Accident | local_authority_ons_district | E07000053 | Weymouth and Portland | NaN |
| 582 | Accident | local_authority_ons_district | E07000061 | Eastbourne | NaN |
| 583 | Accident | local_authority_ons_district | E07000062 | Hastings | NaN |
| 584 | Accident | local_authority_ons_district | E07000063 | Lewes | NaN |
| 585 | Accident | local_authority_ons_district | E07000064 | Rother | NaN |
| 586 | Accident | local_authority_ons_district | E07000065 | Wealden | NaN |
| 587 | Accident | local_authority_ons_district | E07000066 | Basildon | NaN |
| 588 | Accident | local_authority_ons_district | E07000067 | Braintree | NaN |
| 589 | Accident | local_authority_ons_district | E07000068 | Brentwood | NaN |
| 590 | Accident | local_authority_ons_district | E07000069 | Castle Point | NaN |
| 591 | Accident | local_authority_ons_district | E07000070 | Chelmsford | NaN |
| 592 | Accident | local_authority_ons_district | E07000071 | Colchester | NaN |
| 593 | Accident | local_authority_ons_district | E07000072 | Epping Forest | NaN |
| 594 | Accident | local_authority_ons_district | E07000073 | Harlow | NaN |
| 595 | Accident | local_authority_ons_district | E07000074 | Maldon | NaN |
| 596 | Accident | local_authority_ons_district | E07000075 | Rochford | NaN |
| 597 | Accident | local_authority_ons_district | E07000076 | Tendring | NaN |
| 598 | Accident | local_authority_ons_district | E07000077 | Uttlesford | NaN |
| 599 | Accident | local_authority_ons_district | E07000078 | Cheltenham | NaN |
| 600 | Accident | local_authority_ons_district | E07000079 | Cotswold | NaN |
| 601 | Accident | local_authority_ons_district | E07000080 | Forest of Dean | NaN |
| 602 | Accident | local_authority_ons_district | E07000081 | Gloucester | NaN |
| 603 | Accident | local_authority_ons_district | E07000082 | Stroud | NaN |
| 604 | Accident | local_authority_ons_district | E07000083 | Tewkesbury | NaN |
| 605 | Accident | local_authority_ons_district | E07000084 | Basingstoke and Deane | NaN |
| 606 | Accident | local_authority_ons_district | E07000085 | East Hampshire | NaN |
| 607 | Accident | local_authority_ons_district | E07000086 | Eastleigh | NaN |
| 608 | Accident | local_authority_ons_district | E07000087 | Fareham | NaN |
| 609 | Accident | local_authority_ons_district | E07000088 | Gosport | NaN |
| 610 | Accident | local_authority_ons_district | E07000089 | Hart | NaN |
| 611 | Accident | local_authority_ons_district | E07000090 | Havant | NaN |
| 612 | Accident | local_authority_ons_district | E07000091 | New Forest | NaN |
| 613 | Accident | local_authority_ons_district | E07000092 | Rushmoor | NaN |
| 614 | Accident | local_authority_ons_district | E07000093 | Test Valley | NaN |
| 615 | Accident | local_authority_ons_district | E07000094 | Winchester | NaN |
| 616 | Accident | local_authority_ons_district | E07000095 | Broxbourne | NaN |
| 617 | Accident | local_authority_ons_district | E07000096 | Dacorum | NaN |
| 618 | Accident | local_authority_ons_district | E07000097 | East Hertfordshire | NaN |
| 619 | Accident | local_authority_ons_district | E07000098 | Hertsmere | NaN |
| 620 | Accident | local_authority_ons_district | E07000099 | North Hertfordshire | NaN |
| 621 | Accident | local_authority_ons_district | E07000100 | St Albans | NaN |
| 622 | Accident | local_authority_ons_district | E07000101 | Stevenage | NaN |
| 623 | Accident | local_authority_ons_district | E07000102 | Three Rivers | NaN |
| 624 | Accident | local_authority_ons_district | E07000103 | Watford | NaN |
| 625 | Accident | local_authority_ons_district | E07000104 | Welwyn Hatfield | NaN |
| 626 | Accident | local_authority_ons_district | E07000105 | Ashford | NaN |
| 627 | Accident | local_authority_ons_district | E07000106 | Canterbury | NaN |
| 628 | Accident | local_authority_ons_district | E07000107 | Dartford | NaN |
| 629 | Accident | local_authority_ons_district | E07000108 | Dover | NaN |
| 630 | Accident | local_authority_ons_district | E07000109 | Gravesham | NaN |
| 631 | Accident | local_authority_ons_district | E07000110 | Maidstone | NaN |
| 632 | Accident | local_authority_ons_district | E07000111 | Sevenoaks | NaN |
| 633 | Accident | local_authority_ons_district | E07000112 | Shepway | NaN |
| 634 | Accident | local_authority_ons_district | E07000113 | Swale | NaN |
| 635 | Accident | local_authority_ons_district | E07000114 | Thanet | NaN |
| 636 | Accident | local_authority_ons_district | E07000115 | Tonbridge and Malling | NaN |
| 637 | Accident | local_authority_ons_district | E07000116 | Tunbridge Wells | NaN |
| 638 | Accident | local_authority_ons_district | E07000117 | Burnley | NaN |
| 639 | Accident | local_authority_ons_district | E07000118 | Chorley | NaN |
| 640 | Accident | local_authority_ons_district | E07000119 | Fylde | NaN |
| 641 | Accident | local_authority_ons_district | E07000120 | Hyndburn | NaN |
| 642 | Accident | local_authority_ons_district | E07000121 | Lancaster | NaN |
| 643 | Accident | local_authority_ons_district | E07000122 | Pendle | NaN |
| 644 | Accident | local_authority_ons_district | E07000123 | Preston | NaN |
| 645 | Accident | local_authority_ons_district | E07000124 | Ribble Valley | NaN |
| 646 | Accident | local_authority_ons_district | E07000125 | Rossendale | NaN |
| 647 | Accident | local_authority_ons_district | E07000126 | South Ribble | NaN |
| 648 | Accident | local_authority_ons_district | E07000127 | West Lancashire | NaN |
| 649 | Accident | local_authority_ons_district | E07000128 | Wyre | NaN |
| 650 | Accident | local_authority_ons_district | E07000129 | Blaby | NaN |
| 651 | Accident | local_authority_ons_district | E07000130 | Charnwood | NaN |
| 652 | Accident | local_authority_ons_district | E07000131 | Harborough | NaN |
| 653 | Accident | local_authority_ons_district | E07000132 | Hinckley and Bosworth | NaN |
| 654 | Accident | local_authority_ons_district | E07000133 | Melton | NaN |
| 655 | Accident | local_authority_ons_district | E07000134 | North West Leicestershire | NaN |
| 656 | Accident | local_authority_ons_district | E07000135 | Oadby and Wigston | NaN |
| 657 | Accident | local_authority_ons_district | E07000136 | Boston | NaN |
| 658 | Accident | local_authority_ons_district | E07000137 | East Lindsey | NaN |
| 659 | Accident | local_authority_ons_district | E07000138 | Lincoln | NaN |
| 660 | Accident | local_authority_ons_district | E07000139 | North Kesteven | NaN |
| 661 | Accident | local_authority_ons_district | E07000140 | South Holland | NaN |
| 662 | Accident | local_authority_ons_district | E07000141 | South Kesteven | NaN |
| 663 | Accident | local_authority_ons_district | E07000142 | West Lindsey | NaN |
| 664 | Accident | local_authority_ons_district | E07000143 | Breckland | NaN |
| 665 | Accident | local_authority_ons_district | E07000144 | Broadland | NaN |
| 666 | Accident | local_authority_ons_district | E07000145 | Great Yarmouth | NaN |
| 667 | Accident | local_authority_ons_district | E07000146 | King's Lynn and West Norfolk | NaN |
| 668 | Accident | local_authority_ons_district | E07000147 | North Norfolk | NaN |
| 669 | Accident | local_authority_ons_district | E07000148 | Norwich | NaN |
| 670 | Accident | local_authority_ons_district | E07000149 | South Norfolk | NaN |
| 671 | Accident | local_authority_ons_district | E07000150 | Corby | NaN |
| 672 | Accident | local_authority_ons_district | E07000151 | Daventry | NaN |
| 673 | Accident | local_authority_ons_district | E07000152 | East Northamptonshire | NaN |
| 674 | Accident | local_authority_ons_district | E07000153 | Kettering | NaN |
| 675 | Accident | local_authority_ons_district | E07000154 | Northampton | NaN |
| 676 | Accident | local_authority_ons_district | E07000155 | South Northamptonshire | NaN |
| 677 | Accident | local_authority_ons_district | E07000156 | Wellingborough | NaN |
| 678 | Accident | local_authority_ons_district | E07000163 | Craven | NaN |
| 679 | Accident | local_authority_ons_district | E07000164 | Hambleton | NaN |
| 680 | Accident | local_authority_ons_district | E07000165 | Harrogate | NaN |
| 681 | Accident | local_authority_ons_district | E07000166 | Richmondshire | NaN |
| 682 | Accident | local_authority_ons_district | E07000167 | Ryedale | NaN |
| 683 | Accident | local_authority_ons_district | E07000168 | Scarborough | NaN |
| 684 | Accident | local_authority_ons_district | E07000169 | Selby | NaN |
| 685 | Accident | local_authority_ons_district | E07000170 | Ashfield | NaN |
| 686 | Accident | local_authority_ons_district | E07000171 | Bassetlaw | NaN |
| 687 | Accident | local_authority_ons_district | E07000172 | Broxtowe | NaN |
| 688 | Accident | local_authority_ons_district | E07000173 | Gedling | NaN |
| 689 | Accident | local_authority_ons_district | E07000174 | Mansfield | NaN |
| 690 | Accident | local_authority_ons_district | E07000175 | Newark and Sherwood | NaN |
| 691 | Accident | local_authority_ons_district | E07000176 | Rushcliffe | NaN |
| 692 | Accident | local_authority_ons_district | E07000177 | Cherwell | NaN |
| 693 | Accident | local_authority_ons_district | E07000178 | Oxford | NaN |
| 694 | Accident | local_authority_ons_district | E07000179 | South Oxfordshire | NaN |
| 695 | Accident | local_authority_ons_district | E07000180 | Vale of White Horse | NaN |
| 696 | Accident | local_authority_ons_district | E07000181 | West Oxfordshire | NaN |
| 697 | Accident | local_authority_ons_district | E07000187 | Mendip | NaN |
| 698 | Accident | local_authority_ons_district | E07000188 | Sedgemoor | NaN |
| 699 | Accident | local_authority_ons_district | E07000189 | South Somerset | NaN |
| 700 | Accident | local_authority_ons_district | E07000190 | Taunton Deane | NaN |
| 701 | Accident | local_authority_ons_district | E07000191 | West Somerset | NaN |
| 702 | Accident | local_authority_ons_district | E07000192 | Cannock Chase | NaN |
| 703 | Accident | local_authority_ons_district | E07000193 | East Staffordshire | NaN |
| 704 | Accident | local_authority_ons_district | E07000194 | Lichfield | NaN |
| 705 | Accident | local_authority_ons_district | E07000195 | Newcastle-under-Lyme | NaN |
| 706 | Accident | local_authority_ons_district | E07000196 | South Staffordshire | NaN |
| 707 | Accident | local_authority_ons_district | E07000197 | Stafford | NaN |
| 708 | Accident | local_authority_ons_district | E07000198 | Staffordshire Moorlands | NaN |
| 709 | Accident | local_authority_ons_district | E07000199 | Tamworth | NaN |
| 710 | Accident | local_authority_ons_district | E07000200 | Babergh | NaN |
| 711 | Accident | local_authority_ons_district | E07000201 | Forest Heath | NaN |
| 712 | Accident | local_authority_ons_district | E07000202 | Ipswich | NaN |
| 713 | Accident | local_authority_ons_district | E07000203 | Mid Suffolk | NaN |
| 714 | Accident | local_authority_ons_district | E07000204 | St Edmundsbury | NaN |
| 715 | Accident | local_authority_ons_district | E07000205 | Suffolk Coastal | NaN |
| 716 | Accident | local_authority_ons_district | E07000206 | Waveney | NaN |
| 717 | Accident | local_authority_ons_district | E07000207 | Elmbridge | NaN |
| 718 | Accident | local_authority_ons_district | E07000208 | Epsom and Ewell | NaN |
| 719 | Accident | local_authority_ons_district | E07000209 | Guildford | NaN |
| 720 | Accident | local_authority_ons_district | E07000210 | Mole Valley | NaN |
| 721 | Accident | local_authority_ons_district | E07000211 | Reigate and Banstead | NaN |
| 722 | Accident | local_authority_ons_district | E07000212 | Runnymede | NaN |
| 723 | Accident | local_authority_ons_district | E07000213 | Spelthorne | NaN |
| 724 | Accident | local_authority_ons_district | E07000214 | Surrey Heath | NaN |
| 725 | Accident | local_authority_ons_district | E07000215 | Tandridge | NaN |
| 726 | Accident | local_authority_ons_district | E07000216 | Waverley | NaN |
| 727 | Accident | local_authority_ons_district | E07000217 | Woking | NaN |
| 728 | Accident | local_authority_ons_district | E07000218 | North Warwickshire | NaN |
| 729 | Accident | local_authority_ons_district | E07000219 | Nuneaton and Bedworth | NaN |
| 730 | Accident | local_authority_ons_district | E07000220 | Rugby | NaN |
| 731 | Accident | local_authority_ons_district | E07000221 | Stratford-on-Avon | NaN |
| 732 | Accident | local_authority_ons_district | E07000222 | Warwick | NaN |
| 733 | Accident | local_authority_ons_district | E07000223 | Adur | NaN |
| 734 | Accident | local_authority_ons_district | E07000224 | Arun | NaN |
| 735 | Accident | local_authority_ons_district | E07000225 | Chichester | NaN |
| 736 | Accident | local_authority_ons_district | E07000226 | Crawley | NaN |
| 737 | Accident | local_authority_ons_district | E07000227 | Horsham | NaN |
| 738 | Accident | local_authority_ons_district | E07000228 | Mid Sussex | NaN |
| 739 | Accident | local_authority_ons_district | E07000229 | Worthing | NaN |
| 740 | Accident | local_authority_ons_district | E07000234 | Bromsgrove | NaN |
| 741 | Accident | local_authority_ons_district | E07000235 | Malvern Hills | NaN |
| 742 | Accident | local_authority_ons_district | E07000236 | Redditch | NaN |
| 743 | Accident | local_authority_ons_district | E07000237 | Worcester | NaN |
| 744 | Accident | local_authority_ons_district | E07000238 | Wychavon | NaN |
| 745 | Accident | local_authority_ons_district | E07000239 | Wyre Forest | NaN |
| 746 | Accident | local_authority_ons_district | E07000240 | St Albans | NaN |
| 747 | Accident | local_authority_ons_district | E07000241 | Welwyn Hatfield | NaN |
| 748 | Accident | local_authority_ons_district | E07000242 | East Hertfordshire | NaN |
| 749 | Accident | local_authority_ons_district | E07000243 | Stevenage | NaN |
| 750 | Accident | local_authority_ons_district | E07000244 | East Suffolk | NaN |
| 751 | Accident | local_authority_ons_district | E07000245 | West Suffolk | NaN |
| 752 | Accident | local_authority_ons_district | E08000001 | Bolton | NaN |
| 753 | Accident | local_authority_ons_district | E08000002 | Bury | NaN |
| 754 | Accident | local_authority_ons_district | E08000003 | Manchester | NaN |
| 755 | Accident | local_authority_ons_district | E08000004 | Oldham | NaN |
| 756 | Accident | local_authority_ons_district | E08000005 | Rochdale | NaN |
| 757 | Accident | local_authority_ons_district | E08000006 | Salford | NaN |
| 758 | Accident | local_authority_ons_district | E08000007 | Stockport | NaN |
| 759 | Accident | local_authority_ons_district | E08000008 | Tameside | NaN |
| 760 | Accident | local_authority_ons_district | E08000009 | Trafford | NaN |
| 761 | Accident | local_authority_ons_district | E08000010 | Wigan | NaN |
| 762 | Accident | local_authority_ons_district | E08000011 | Knowsley | NaN |
| 763 | Accident | local_authority_ons_district | E08000012 | Liverpool | NaN |
| 764 | Accident | local_authority_ons_district | E08000013 | St. Helens | NaN |
| 765 | Accident | local_authority_ons_district | E08000014 | Sefton | NaN |
| 766 | Accident | local_authority_ons_district | E08000015 | Wirral | NaN |
| 767 | Accident | local_authority_ons_district | E08000016 | Barnsley | NaN |
| 768 | Accident | local_authority_ons_district | E08000017 | Doncaster | NaN |
| 769 | Accident | local_authority_ons_district | E08000018 | Rotherham | NaN |
| 770 | Accident | local_authority_ons_district | E08000019 | Sheffield | NaN |
| 771 | Accident | local_authority_ons_district | E08000020 | Gateshead | NaN |
| 772 | Accident | local_authority_ons_district | E08000021 | Newcastle upon Tyne | NaN |
| 773 | Accident | local_authority_ons_district | E08000022 | North Tyneside | NaN |
| 774 | Accident | local_authority_ons_district | E08000023 | South Tyneside | NaN |
| 775 | Accident | local_authority_ons_district | E08000024 | Sunderland | NaN |
| 776 | Accident | local_authority_ons_district | E08000025 | Birmingham | NaN |
| 777 | Accident | local_authority_ons_district | E08000026 | Coventry | NaN |
| 778 | Accident | local_authority_ons_district | E08000027 | Dudley | NaN |
| 779 | Accident | local_authority_ons_district | E08000028 | Sandwell | NaN |
| 780 | Accident | local_authority_ons_district | E08000029 | Solihull | NaN |
| 781 | Accident | local_authority_ons_district | E08000030 | Walsall | NaN |
| 782 | Accident | local_authority_ons_district | E08000031 | Wolverhampton | NaN |
| 783 | Accident | local_authority_ons_district | E08000032 | Bradford | NaN |
| 784 | Accident | local_authority_ons_district | E08000033 | Calderdale | NaN |
| 785 | Accident | local_authority_ons_district | E08000034 | Kirklees | NaN |
| 786 | Accident | local_authority_ons_district | E08000035 | Leeds | NaN |
| 787 | Accident | local_authority_ons_district | E08000036 | Wakefield | NaN |
| 788 | Accident | local_authority_ons_district | E09000001 | City of London | NaN |
| 789 | Accident | local_authority_ons_district | E09000001 | City of London | NaN |
| 790 | Accident | local_authority_ons_district | E09000002 | Barking and Dagenham | NaN |
| 791 | Accident | local_authority_ons_district | E09000003 | Barnet | NaN |
| 792 | Accident | local_authority_ons_district | E09000004 | Bexley | NaN |
| 793 | Accident | local_authority_ons_district | E09000005 | Brent | NaN |
| 794 | Accident | local_authority_ons_district | E09000006 | Bromley | NaN |
| 795 | Accident | local_authority_ons_district | E09000007 | Camden | NaN |
| 796 | Accident | local_authority_ons_district | E09000008 | Croydon | NaN |
| 797 | Accident | local_authority_ons_district | E09000009 | Ealing | NaN |
| 798 | Accident | local_authority_ons_district | E09000010 | Enfield | NaN |
| 799 | Accident | local_authority_ons_district | E09000011 | Greenwich | NaN |
| 800 | Accident | local_authority_ons_district | E09000012 | Hackney | NaN |
| 801 | Accident | local_authority_ons_district | E09000013 | Hammersmith and Fulham | NaN |
| 802 | Accident | local_authority_ons_district | E09000014 | Haringey | NaN |
| 803 | Accident | local_authority_ons_district | E09000015 | Harrow | NaN |
| 804 | Accident | local_authority_ons_district | E09000016 | Havering | NaN |
| 805 | Accident | local_authority_ons_district | E09000017 | Hillingdon | NaN |
| 806 | Accident | local_authority_ons_district | E09000018 | Hounslow | NaN |
| 807 | Accident | local_authority_ons_district | E09000019 | Islington | NaN |
| 808 | Accident | local_authority_ons_district | E09000020 | Kensington and Chelsea | NaN |
| 809 | Accident | local_authority_ons_district | E09000021 | Kingston upon Thames | NaN |
| 810 | Accident | local_authority_ons_district | E09000022 | Lambeth | NaN |
| 811 | Accident | local_authority_ons_district | E09000023 | Lewisham | NaN |
| 812 | Accident | local_authority_ons_district | E09000024 | Merton | NaN |
| 813 | Accident | local_authority_ons_district | E09000025 | Newham | NaN |
| 814 | Accident | local_authority_ons_district | E09000026 | Redbridge | NaN |
| 815 | Accident | local_authority_ons_district | E09000027 | Richmond upon Thames | NaN |
| 816 | Accident | local_authority_ons_district | E09000028 | Southwark | NaN |
| 817 | Accident | local_authority_ons_district | E09000029 | Sutton | NaN |
| 818 | Accident | local_authority_ons_district | E09000030 | Tower Hamlets | NaN |
| 819 | Accident | local_authority_ons_district | E09000031 | Waltham Forest | NaN |
| 820 | Accident | local_authority_ons_district | E09000032 | Wandsworth | NaN |
| 821 | Accident | local_authority_ons_district | E09000033 | Westminster | NaN |
| 822 | Accident | local_authority_ons_district | EHEATHROW | London Airport (Heathrow) | NaN |
| 823 | Accident | local_authority_ons_district | S12000005 | Clackmannanshire | NaN |
| 824 | Accident | local_authority_ons_district | S12000005 | Clackmannanshire | NaN |
| 825 | Accident | local_authority_ons_district | S12000006 | Dumfries and Galloway | NaN |
| 826 | Accident | local_authority_ons_district | S12000006 | Dumfries and Galloway | NaN |
| 827 | Accident | local_authority_ons_district | S12000008 | East Ayrshire | NaN |
| 828 | Accident | local_authority_ons_district | S12000008 | East Ayrshire | NaN |
| 829 | Accident | local_authority_ons_district | S12000009 | East Dunbartonshire | NaN |
| 830 | Accident | local_authority_ons_district | S12000009 | East Dunbartonshire | NaN |
| 831 | Accident | local_authority_ons_district | S12000010 | East Lothian | NaN |
| 832 | Accident | local_authority_ons_district | S12000010 | East Lothian | NaN |
| 833 | Accident | local_authority_ons_district | S12000011 | East Renfrewshire | NaN |
| 834 | Accident | local_authority_ons_district | S12000011 | East Renfrewshire | NaN |
| 835 | Accident | local_authority_ons_district | S12000013 | Comhairle nan Eilean Siar | NaN |
| 836 | Accident | local_authority_ons_district | S12000013 | Comhairle nan Eilean Siar | NaN |
| 837 | Accident | local_authority_ons_district | S12000014 | Falkirk | NaN |
| 838 | Accident | local_authority_ons_district | S12000014 | Falkirk | NaN |
| 839 | Accident | local_authority_ons_district | S12000015 | Fife | NaN |
| 840 | Accident | local_authority_ons_district | S12000015 | Fife | NaN |
| 841 | Accident | local_authority_ons_district | S12000017 | Highland | NaN |
| 842 | Accident | local_authority_ons_district | S12000017 | Highland | NaN |
| 843 | Accident | local_authority_ons_district | S12000018 | Inverclyde | NaN |
| 844 | Accident | local_authority_ons_district | S12000018 | Inverclyde | NaN |
| 845 | Accident | local_authority_ons_district | S12000019 | Midlothian | NaN |
| 846 | Accident | local_authority_ons_district | S12000019 | Midlothian | NaN |
| 847 | Accident | local_authority_ons_district | S12000020 | Moray | NaN |
| 848 | Accident | local_authority_ons_district | S12000020 | Moray | NaN |
| 849 | Accident | local_authority_ons_district | S12000021 | North Ayrshire | NaN |
| 850 | Accident | local_authority_ons_district | S12000021 | North Ayrshire | NaN |
| 851 | Accident | local_authority_ons_district | S12000023 | Orkney Islands | NaN |
| 852 | Accident | local_authority_ons_district | S12000023 | Orkney Islands | NaN |
| 853 | Accident | local_authority_ons_district | S12000024 | Perth and Kinross | NaN |
| 854 | Accident | local_authority_ons_district | S12000024 | Perth and Kinross | NaN |
| 855 | Accident | local_authority_ons_district | S12000026 | Scottish Borders | NaN |
| 856 | Accident | local_authority_ons_district | S12000026 | Scottish Borders | NaN |
| 857 | Accident | local_authority_ons_district | S12000027 | Shetland Islands | NaN |
| 858 | Accident | local_authority_ons_district | S12000027 | Shetland Islands | NaN |
| 859 | Accident | local_authority_ons_district | S12000028 | South Ayrshire | NaN |
| 860 | Accident | local_authority_ons_district | S12000028 | South Ayrshire | NaN |
| 861 | Accident | local_authority_ons_district | S12000029 | South Lanarkshire | NaN |
| 862 | Accident | local_authority_ons_district | S12000029 | South Lanarkshire | NaN |
| 863 | Accident | local_authority_ons_district | S12000030 | Stirling | NaN |
| 864 | Accident | local_authority_ons_district | S12000030 | Stirling | NaN |
| 865 | Accident | local_authority_ons_district | S12000033 | Aberdeen City | NaN |
| 866 | Accident | local_authority_ons_district | S12000033 | Aberdeen City | NaN |
| 867 | Accident | local_authority_ons_district | S12000034 | Aberdeenshire | NaN |
| 868 | Accident | local_authority_ons_district | S12000034 | Aberdeenshire | NaN |
| 869 | Accident | local_authority_ons_district | S12000035 | Argyll and Bute | NaN |
| 870 | Accident | local_authority_ons_district | S12000035 | Argyll and Bute | NaN |
| 871 | Accident | local_authority_ons_district | S12000036 | City of Edinburgh | NaN |
| 872 | Accident | local_authority_ons_district | S12000036 | City of Edinburgh | NaN |
| 873 | Accident | local_authority_ons_district | S12000038 | Renfrewshire | NaN |
| 874 | Accident | local_authority_ons_district | S12000038 | Renfrewshire | NaN |
| 875 | Accident | local_authority_ons_district | S12000039 | West Dunbartonshire | NaN |
| 876 | Accident | local_authority_ons_district | S12000039 | West Dunbartonshire | NaN |
| 877 | Accident | local_authority_ons_district | S12000040 | West Lothian | NaN |
| 878 | Accident | local_authority_ons_district | S12000040 | West Lothian | NaN |
| 879 | Accident | local_authority_ons_district | S12000041 | Angus | NaN |
| 880 | Accident | local_authority_ons_district | S12000041 | Angus | NaN |
| 881 | Accident | local_authority_ons_district | S12000042 | Dundee City | NaN |
| 882 | Accident | local_authority_ons_district | S12000042 | Dundee City | NaN |
| 883 | Accident | local_authority_ons_district | S12000043 | Glasgow City | NaN |
| 884 | Accident | local_authority_ons_district | S12000043 | Glasgow City | NaN |
| 885 | Accident | local_authority_ons_district | S12000044 | North Lanarkshire | NaN |
| 886 | Accident | local_authority_ons_district | S12000044 | North Lanarkshire | NaN |
| 887 | Accident | local_authority_ons_district | S12000045 | East Dunbartonshire | NaN |
| 888 | Accident | local_authority_ons_district | S12000046 | Glasgow City | NaN |
| 889 | Accident | local_authority_ons_district | W06000001 | Isle of Anglesey | NaN |
| 890 | Accident | local_authority_ons_district | W06000002 | Gwynedd | NaN |
| 891 | Accident | local_authority_ons_district | W06000003 | Conwy | NaN |
| 892 | Accident | local_authority_ons_district | W06000004 | Denbighshire | NaN |
| 893 | Accident | local_authority_ons_district | W06000005 | Flintshire | NaN |
| 894 | Accident | local_authority_ons_district | W06000006 | Wrexham | NaN |
| 895 | Accident | local_authority_ons_district | W06000008 | Ceredigion | NaN |
| 896 | Accident | local_authority_ons_district | W06000009 | Pembrokeshire | NaN |
| 897 | Accident | local_authority_ons_district | W06000010 | Carmarthenshire | NaN |
| 898 | Accident | local_authority_ons_district | W06000011 | Swansea | NaN |
| 899 | Accident | local_authority_ons_district | W06000012 | Neath Port Talbot | NaN |
| 900 | Accident | local_authority_ons_district | W06000013 | Bridgend | NaN |
| 901 | Accident | local_authority_ons_district | W06000014 | Vale of Glamorgan | NaN |
| 902 | Accident | local_authority_ons_district | W06000015 | Cardiff | NaN |
| 903 | Accident | local_authority_ons_district | W06000016 | Rhondda Cynon Taf | NaN |
| 904 | Accident | local_authority_ons_district | W06000018 | Caerphilly | NaN |
| 905 | Accident | local_authority_ons_district | W06000019 | Blaenau Gwent | NaN |
| 906 | Accident | local_authority_ons_district | W06000020 | Torfaen | NaN |
| 907 | Accident | local_authority_ons_district | W06000021 | Monmouthshire | NaN |
| 908 | Accident | local_authority_ons_district | W06000022 | Newport | NaN |
| 909 | Accident | local_authority_ons_district | W06000023 | Powys | NaN |
| 910 | Accident | local_authority_ons_district | W06000024 | Merthyr Tydfil | NaN |
| 911 | Accident | local_authority_highway | E06000001 | Hartlepool | NaN |
| 912 | Accident | local_authority_highway | E06000002 | Middlesbrough | NaN |
| 913 | Accident | local_authority_highway | E06000003 | Redcar and Cleveland | NaN |
| 914 | Accident | local_authority_highway | E06000004 | Stockton-on-Tees | NaN |
| 915 | Accident | local_authority_highway | E06000005 | Darlington | NaN |
| 916 | Accident | local_authority_highway | E06000006 | Halton | NaN |
| 917 | Accident | local_authority_highway | E06000007 | Warrington | NaN |
| 918 | Accident | local_authority_highway | E06000008 | Blackburn with Darwen | NaN |
| 919 | Accident | local_authority_highway | E06000009 | Blackpool | NaN |
| 920 | Accident | local_authority_highway | E06000010 | Kingston upon Hull, City of | NaN |
| 921 | Accident | local_authority_highway | E06000011 | East Riding of Yorkshire | NaN |
| 922 | Accident | local_authority_highway | E06000012 | North East Lincolnshire | NaN |
| 923 | Accident | local_authority_highway | E06000013 | North Lincolnshire | NaN |
| 924 | Accident | local_authority_highway | E06000014 | York | NaN |
| 925 | Accident | local_authority_highway | E06000015 | Derby | NaN |
| 926 | Accident | local_authority_highway | E06000016 | Leicester | NaN |
| 927 | Accident | local_authority_highway | E06000017 | Rutland | NaN |
| 928 | Accident | local_authority_highway | E06000018 | Nottingham | NaN |
| 929 | Accident | local_authority_highway | E06000019 | Herefordshire, County of | NaN |
| 930 | Accident | local_authority_highway | E06000020 | Telford and Wrekin | NaN |
| 931 | Accident | local_authority_highway | E06000021 | Stoke-on-Trent | NaN |
| 932 | Accident | local_authority_highway | E06000022 | Bath and North East Somerset | NaN |
| 933 | Accident | local_authority_highway | E06000023 | Bristol, City of | NaN |
| 934 | Accident | local_authority_highway | E06000024 | North Somerset | NaN |
| 935 | Accident | local_authority_highway | E06000025 | South Gloucestershire | NaN |
| 936 | Accident | local_authority_highway | E06000026 | Plymouth | NaN |
| 937 | Accident | local_authority_highway | E06000027 | Torbay | NaN |
| 938 | Accident | local_authority_highway | E06000028 | Bournemouth | NaN |
| 939 | Accident | local_authority_highway | E06000029 | Poole | NaN |
| 940 | Accident | local_authority_highway | E06000030 | Swindon | NaN |
| 941 | Accident | local_authority_highway | E06000031 | Peterborough | NaN |
| 942 | Accident | local_authority_highway | E06000032 | Luton | NaN |
| 943 | Accident | local_authority_highway | E06000033 | Southend-on-Sea | NaN |
| 944 | Accident | local_authority_highway | E06000034 | Thurrock | NaN |
| 945 | Accident | local_authority_highway | E06000035 | Medway | NaN |
| 946 | Accident | local_authority_highway | E06000036 | Bracknell Forest | NaN |
| 947 | Accident | local_authority_highway | E06000037 | West Berkshire | NaN |
| 948 | Accident | local_authority_highway | E06000038 | Reading | NaN |
| 949 | Accident | local_authority_highway | E06000039 | Slough | NaN |
| 950 | Accident | local_authority_highway | E06000040 | Windsor and Maidenhead | NaN |
| 951 | Accident | local_authority_highway | E06000041 | Wokingham | NaN |
| 952 | Accident | local_authority_highway | E06000042 | Milton Keynes | NaN |
| 953 | Accident | local_authority_highway | E06000043 | Brighton and Hove | NaN |
| 954 | Accident | local_authority_highway | E06000044 | Portsmouth | NaN |
| 955 | Accident | local_authority_highway | E06000045 | Southampton | NaN |
| 956 | Accident | local_authority_highway | E06000046 | Isle of Wight | NaN |
| 957 | Accident | local_authority_highway | E06000047 | County Durham | NaN |
| 958 | Accident | local_authority_highway | E06000048 | Northumberland | NaN |
| 959 | Accident | local_authority_highway | E06000049 | Cheshire East | NaN |
| 960 | Accident | local_authority_highway | E06000050 | Cheshire West and Chester | NaN |
| 961 | Accident | local_authority_highway | E06000051 | Shropshire | NaN |
| 962 | Accident | local_authority_highway | E06000052 | Cornwall | NaN |
| 963 | Accident | local_authority_highway | E06000053 | Isles of Scilly | NaN |
| 964 | Accident | local_authority_highway | E06000054 | Wiltshire | NaN |
| 965 | Accident | local_authority_highway | E06000055 | Bedford | NaN |
| 966 | Accident | local_authority_highway | E06000056 | Central Bedfordshire | NaN |
| 967 | Accident | local_authority_highway | E08000001 | Bolton | NaN |
| 968 | Accident | local_authority_highway | E08000002 | Bury | NaN |
| 969 | Accident | local_authority_highway | E08000003 | Manchester | NaN |
| 970 | Accident | local_authority_highway | E08000004 | Oldham | NaN |
| 971 | Accident | local_authority_highway | E08000005 | Rochdale | NaN |
| 972 | Accident | local_authority_highway | E08000006 | Salford | NaN |
| 973 | Accident | local_authority_highway | E08000007 | Stockport | NaN |
| 974 | Accident | local_authority_highway | E08000008 | Tameside | NaN |
| 975 | Accident | local_authority_highway | E08000009 | Trafford | NaN |
| 976 | Accident | local_authority_highway | E08000010 | Wigan | NaN |
| 977 | Accident | local_authority_highway | E08000011 | Knowsley | NaN |
| 978 | Accident | local_authority_highway | E08000012 | Liverpool | NaN |
| 979 | Accident | local_authority_highway | E08000013 | St. Helens | NaN |
| 980 | Accident | local_authority_highway | E08000014 | Sefton | NaN |
| 981 | Accident | local_authority_highway | E08000015 | Wirral | NaN |
| 982 | Accident | local_authority_highway | E08000016 | Barnsley | NaN |
| 983 | Accident | local_authority_highway | E08000017 | Doncaster | NaN |
| 984 | Accident | local_authority_highway | E08000018 | Rotherham | NaN |
| 985 | Accident | local_authority_highway | E08000019 | Sheffield | NaN |
| 986 | Accident | local_authority_highway | E08000020 | Gateshead | NaN |
| 987 | Accident | local_authority_highway | E08000021 | Newcastle upon Tyne | NaN |
| 988 | Accident | local_authority_highway | E08000022 | North Tyneside | NaN |
| 989 | Accident | local_authority_highway | E08000023 | South Tyneside | NaN |
| 990 | Accident | local_authority_highway | E08000024 | Sunderland | NaN |
| 991 | Accident | local_authority_highway | E08000025 | Birmingham | NaN |
| 992 | Accident | local_authority_highway | E08000026 | Coventry | NaN |
| 993 | Accident | local_authority_highway | E08000027 | Dudley | NaN |
| 994 | Accident | local_authority_highway | E08000028 | Sandwell | NaN |
| 995 | Accident | local_authority_highway | E08000029 | Solihull | NaN |
| 996 | Accident | local_authority_highway | E08000030 | Walsall | NaN |
| 997 | Accident | local_authority_highway | E08000031 | Wolverhampton | NaN |
| 998 | Accident | local_authority_highway | E08000032 | Bradford | NaN |
| 999 | Accident | local_authority_highway | E08000033 | Calderdale | NaN |
| 1000 | Accident | local_authority_highway | E08000034 | Kirklees | NaN |
| 1001 | Accident | local_authority_highway | E08000035 | Leeds | NaN |
| 1002 | Accident | local_authority_highway | E08000036 | Wakefield | NaN |
| 1003 | Accident | local_authority_highway | E09000001 | City of London | NaN |
| 1004 | Accident | local_authority_highway | E09000002 | Barking and Dagenham | NaN |
| 1005 | Accident | local_authority_highway | E09000003 | Barnet | NaN |
| 1006 | Accident | local_authority_highway | E09000004 | Bexley | NaN |
| 1007 | Accident | local_authority_highway | E09000005 | Brent | NaN |
| 1008 | Accident | local_authority_highway | E09000006 | Bromley | NaN |
| 1009 | Accident | local_authority_highway | E09000007 | Camden | NaN |
| 1010 | Accident | local_authority_highway | E09000008 | Croydon | NaN |
| 1011 | Accident | local_authority_highway | E09000009 | Ealing | NaN |
| 1012 | Accident | local_authority_highway | E09000010 | Enfield | NaN |
| 1013 | Accident | local_authority_highway | E09000011 | Greenwich | NaN |
| 1014 | Accident | local_authority_highway | E09000012 | Hackney | NaN |
| 1015 | Accident | local_authority_highway | E09000013 | Hammersmith and Fulham | NaN |
| 1016 | Accident | local_authority_highway | E09000014 | Haringey | NaN |
| 1017 | Accident | local_authority_highway | E09000015 | Harrow | NaN |
| 1018 | Accident | local_authority_highway | E09000016 | Havering | NaN |
| 1019 | Accident | local_authority_highway | E09000017 | Hillingdon | NaN |
| 1020 | Accident | local_authority_highway | E09000018 | Hounslow | NaN |
| 1021 | Accident | local_authority_highway | E09000019 | Islington | NaN |
| 1022 | Accident | local_authority_highway | E09000020 | Kensington and Chelsea | NaN |
| 1023 | Accident | local_authority_highway | E09000021 | Kingston upon Thames | NaN |
| 1024 | Accident | local_authority_highway | E09000022 | Lambeth | NaN |
| 1025 | Accident | local_authority_highway | E09000023 | Lewisham | NaN |
| 1026 | Accident | local_authority_highway | E09000024 | Merton | NaN |
| 1027 | Accident | local_authority_highway | E09000025 | Newham | NaN |
| 1028 | Accident | local_authority_highway | E09000026 | Redbridge | NaN |
| 1029 | Accident | local_authority_highway | E09000027 | Richmond upon Thames | NaN |
| 1030 | Accident | local_authority_highway | E09000028 | Southwark | NaN |
| 1031 | Accident | local_authority_highway | E09000029 | Sutton | NaN |
| 1032 | Accident | local_authority_highway | E09000030 | Tower Hamlets | NaN |
| 1033 | Accident | local_authority_highway | E09000031 | Waltham Forest | NaN |
| 1034 | Accident | local_authority_highway | E09000032 | Wandsworth | NaN |
| 1035 | Accident | local_authority_highway | E09000033 | Westminster | NaN |
| 1036 | Accident | local_authority_highway | E10000002 | Buckinghamshire | NaN |
| 1037 | Accident | local_authority_highway | E10000003 | Cambridgeshire | NaN |
| 1038 | Accident | local_authority_highway | E10000006 | Cumbria | NaN |
| 1039 | Accident | local_authority_highway | E10000007 | Derbyshire | NaN |
| 1040 | Accident | local_authority_highway | E10000008 | Devon | NaN |
| 1041 | Accident | local_authority_highway | E10000009 | Dorset | NaN |
| 1042 | Accident | local_authority_highway | E10000011 | East Sussex | NaN |
| 1043 | Accident | local_authority_highway | E10000012 | Essex | NaN |
| 1044 | Accident | local_authority_highway | E10000013 | Gloucestershire | NaN |
| 1045 | Accident | local_authority_highway | E10000014 | Hampshire | NaN |
| 1046 | Accident | local_authority_highway | E10000015 | Hertfordshire | NaN |
| 1047 | Accident | local_authority_highway | E10000016 | Kent | NaN |
| 1048 | Accident | local_authority_highway | E10000017 | Lancashire | NaN |
| 1049 | Accident | local_authority_highway | E10000018 | Leicestershire | NaN |
| 1050 | Accident | local_authority_highway | E10000019 | Lincolnshire | NaN |
| 1051 | Accident | local_authority_highway | E10000020 | Norfolk | NaN |
| 1052 | Accident | local_authority_highway | E10000021 | Northamptonshire | NaN |
| 1053 | Accident | local_authority_highway | E10000023 | North Yorkshire | NaN |
| 1054 | Accident | local_authority_highway | E10000024 | Nottinghamshire | NaN |
| 1055 | Accident | local_authority_highway | E10000025 | Oxfordshire | NaN |
| 1056 | Accident | local_authority_highway | E10000027 | Somerset | NaN |
| 1057 | Accident | local_authority_highway | E10000028 | Staffordshire | NaN |
| 1058 | Accident | local_authority_highway | E10000029 | Suffolk | NaN |
| 1059 | Accident | local_authority_highway | E10000030 | Surrey | NaN |
| 1060 | Accident | local_authority_highway | E10000031 | Warwickshire | NaN |
| 1061 | Accident | local_authority_highway | E10000032 | West Sussex | NaN |
| 1062 | Accident | local_authority_highway | E10000034 | Worcestershire | NaN |
| 1063 | Accident | local_authority_highway | EHEATHROW | London Airport (Heathrow) | NaN |
| 1064 | Accident | local_authority_highway | S12000005 | Clackmannanshire | NaN |
| 1065 | Accident | local_authority_highway | S12000006 | Dumfries & Galloway | NaN |
| 1066 | Accident | local_authority_highway | S12000008 | East Ayrshire | NaN |
| 1067 | Accident | local_authority_highway | S12000009 | East Dunbartonshire | NaN |
| 1068 | Accident | local_authority_highway | S12000010 | East Lothian | NaN |
| 1069 | Accident | local_authority_highway | S12000011 | East Renfrewshire | NaN |
| 1070 | Accident | local_authority_highway | S12000013 | Na h-Eileanan an Iar (Western Isles) | NaN |
| 1071 | Accident | local_authority_highway | S12000014 | Falkirk | NaN |
| 1072 | Accident | local_authority_highway | S12000015 | Fife | NaN |
| 1073 | Accident | local_authority_highway | S12000017 | Highland | NaN |
| 1074 | Accident | local_authority_highway | S12000018 | Inverclyde | NaN |
| 1075 | Accident | local_authority_highway | S12000019 | Midlothian | NaN |
| 1076 | Accident | local_authority_highway | S12000020 | Moray | NaN |
| 1077 | Accident | local_authority_highway | S12000021 | North Ayrshire | NaN |
| 1078 | Accident | local_authority_highway | S12000023 | Orkney Islands | NaN |
| 1079 | Accident | local_authority_highway | S12000024 | Perth and Kinross | NaN |
| 1080 | Accident | local_authority_highway | S12000026 | Scottish Borders | NaN |
| 1081 | Accident | local_authority_highway | S12000027 | Shetland Islands | NaN |
| 1082 | Accident | local_authority_highway | S12000028 | South Ayrshire | NaN |
| 1083 | Accident | local_authority_highway | S12000029 | South Lanarkshire | NaN |
| 1084 | Accident | local_authority_highway | S12000030 | Stirling | NaN |
| 1085 | Accident | local_authority_highway | S12000033 | Aberdeen City | NaN |
| 1086 | Accident | local_authority_highway | S12000034 | Aberdeenshire | NaN |
| 1087 | Accident | local_authority_highway | S12000035 | Argyll & Bute | NaN |
| 1088 | Accident | local_authority_highway | S12000036 | Edinburgh, City of | NaN |
| 1089 | Accident | local_authority_highway | S12000038 | Renfrewshire | NaN |
| 1090 | Accident | local_authority_highway | S12000039 | West Dunbartonshire | NaN |
| 1091 | Accident | local_authority_highway | S12000040 | West Lothian | NaN |
| 1092 | Accident | local_authority_highway | S12000041 | Angus | NaN |
| 1093 | Accident | local_authority_highway | S12000042 | Dundee City | NaN |
| 1094 | Accident | local_authority_highway | S12000043 | Glasgow City | NaN |
| 1095 | Accident | local_authority_highway | S12000044 | North Lanarkshire | NaN |
| 1096 | Accident | local_authority_highway | W06000001 | Isle of Anglesey | NaN |
| 1097 | Accident | local_authority_highway | W06000002 | Gwynedd | NaN |
| 1098 | Accident | local_authority_highway | W06000003 | Conwy | NaN |
| 1099 | Accident | local_authority_highway | W06000004 | Denbighshire | NaN |
| 1100 | Accident | local_authority_highway | W06000005 | Flintshire | NaN |
| 1101 | Accident | local_authority_highway | W06000006 | Wrexham | NaN |
| 1102 | Accident | local_authority_highway | W06000008 | Ceredigion | NaN |
| 1103 | Accident | local_authority_highway | W06000009 | Pembrokeshire | NaN |
| 1104 | Accident | local_authority_highway | W06000010 | Carmarthenshire | NaN |
| 1105 | Accident | local_authority_highway | W06000011 | Swansea | NaN |
| 1106 | Accident | local_authority_highway | W06000012 | Neath Port Talbot | NaN |
| 1107 | Accident | local_authority_highway | W06000013 | Bridgend | NaN |
| 1108 | Accident | local_authority_highway | W06000014 | The Vale of Glamorgan | NaN |
| 1109 | Accident | local_authority_highway | W06000015 | Cardiff | NaN |
| 1110 | Accident | local_authority_highway | W06000016 | Rhondda, Cynon, Taff | NaN |
| 1111 | Accident | local_authority_highway | W06000018 | Caerphilly | NaN |
| 1112 | Accident | local_authority_highway | W06000019 | Blaenau Gwent | NaN |
| 1113 | Accident | local_authority_highway | W06000020 | Torfaen | NaN |
| 1114 | Accident | local_authority_highway | W06000021 | Monmouthshire | NaN |
| 1115 | Accident | local_authority_highway | W06000022 | Newport | NaN |
| 1116 | Accident | local_authority_highway | W06000023 | Powys | NaN |
| 1117 | Accident | local_authority_highway | W06000024 | Merthyr Tydfil | NaN |
| 1118 | Accident | first_road_class | 1 | Motorway | NaN |
| 1119 | Accident | first_road_class | 2 | A(M) | NaN |
| 1120 | Accident | first_road_class | 3 | A | NaN |
| 1121 | Accident | first_road_class | 4 | B | NaN |
| 1122 | Accident | first_road_class | 5 | C | NaN |
| 1123 | Accident | first_road_class | 6 | Unclassified | NaN |
| 1124 | Accident | first_road_number | 1 to 9999 | Number range | NaN |
| 1125 | Accident | first_road_number | -1 | Unknown | NaN |
| 1126 | Accident | first_road_number | 0 | first_road_class is C or Unclassified. These r... | NaN |
| 1127 | Accident | road_type | 1 | Roundabout | NaN |
| 1128 | Accident | road_type | 2 | One way street | NaN |
| 1129 | Accident | road_type | 3 | Dual carriageway | NaN |
| 1130 | Accident | road_type | 6 | Single carriageway | NaN |
| 1131 | Accident | road_type | 7 | Slip road | NaN |
| 1132 | Accident | road_type | 9 | Unknown | NaN |
| 1133 | Accident | road_type | 12 | One way street/Slip road | NaN |
| 1134 | Accident | road_type | -1 | Data missing or out of range | NaN |
| 1135 | Accident | speed_limit | NaN | NaN | 20,30,40,50,60,70 are the only valid speed lim... |
| 1136 | Accident | speed_limit | -1 | Data missing or out of range | NaN |
| 1137 | Accident | speed_limit | 99 | unknown (self reported) | NaN |
| 1138 | Accident | junction_detail | 0 | Not at junction or within 20 metres | NaN |
| 1139 | Accident | junction_detail | 1 | Roundabout | NaN |
| 1140 | Accident | junction_detail | 2 | Mini-roundabout | NaN |
| 1141 | Accident | junction_detail | 3 | T or staggered junction | NaN |
| 1142 | Accident | junction_detail | 5 | Slip road | NaN |
| 1143 | Accident | junction_detail | 6 | Crossroads | NaN |
| 1144 | Accident | junction_detail | 7 | More than 4 arms (not roundabout) | NaN |
| 1145 | Accident | junction_detail | 8 | Private drive or entrance | NaN |
| 1146 | Accident | junction_detail | 9 | Other junction | NaN |
| 1147 | Accident | junction_detail | 99 | unknown (self reported) | NaN |
| 1148 | Accident | junction_detail | -1 | Data missing or out of range | NaN |
| 1149 | Accident | junction_control | 0 | Not at junction or within 20 metres | NaN |
| 1150 | Accident | junction_control | 1 | Authorised person | NaN |
| 1151 | Accident | junction_control | 2 | Auto traffic signal | NaN |
| 1152 | Accident | junction_control | 3 | Stop sign | NaN |
| 1153 | Accident | junction_control | 4 | Give way or uncontrolled | NaN |
| 1154 | Accident | junction_control | -1 | Data missing or out of range | NaN |
| 1155 | Accident | junction_control | 9 | unknown (self reported) | NaN |
| 1156 | Accident | second_road_class | 0 | Not at junction or within 20 metres | NaN |
| 1157 | Accident | second_road_class | 1 | Motorway | NaN |
| 1158 | Accident | second_road_class | 2 | A(M) | NaN |
| 1159 | Accident | second_road_class | 3 | A | NaN |
| 1160 | Accident | second_road_class | 4 | B | NaN |
| 1161 | Accident | second_road_class | 5 | C | NaN |
| 1162 | Accident | second_road_class | 6 | Unclassified | NaN |
| 1163 | Accident | second_road_number | 1 to 9999 | Number range | NaN |
| 1164 | Accident | second_road_number | -1 | Unknown | NaN |
| 1165 | Accident | second_road_number | 0 | first_road_class is C or Unclassified. These r... | NaN |
| 1166 | Accident | pedestrian_crossing_human_control | 0 | None within 50 metres | NaN |
| 1167 | Accident | pedestrian_crossing_human_control | 1 | Control by school crossing patrol | NaN |
| 1168 | Accident | pedestrian_crossing_human_control | 2 | Control by other authorised person | NaN |
| 1169 | Accident | pedestrian_crossing_human_control | -1 | Data missing or out of range | NaN |
| 1170 | Accident | pedestrian_crossing_human_control | 9 | unknown (self reported) | NaN |
| 1171 | Accident | pedestrian_crossing_physical_facilities | 0 | No physical crossing facilities within 50 metres | NaN |
| 1172 | Accident | pedestrian_crossing_physical_facilities | 1 | Zebra | NaN |
| 1173 | Accident | pedestrian_crossing_physical_facilities | 4 | Pelican, puffin, toucan or similar non-junctio... | NaN |
| 1174 | Accident | pedestrian_crossing_physical_facilities | 5 | Pedestrian phase at traffic signal junction | NaN |
| 1175 | Accident | pedestrian_crossing_physical_facilities | 7 | Footbridge or subway | NaN |
| 1176 | Accident | pedestrian_crossing_physical_facilities | 8 | Central refuge | NaN |
| 1177 | Accident | pedestrian_crossing_physical_facilities | -1 | Data missing or out of range | NaN |
| 1178 | Accident | pedestrian_crossing_physical_facilities | 9 | unknown (self reported) | NaN |
| 1179 | Accident | light_conditions | 1 | Daylight | NaN |
| 1180 | Accident | light_conditions | 4 | Darkness - lights lit | NaN |
| 1181 | Accident | light_conditions | 5 | Darkness - lights unlit | NaN |
| 1182 | Accident | light_conditions | 6 | Darkness - no lighting | NaN |
| 1183 | Accident | light_conditions | 7 | Darkness - lighting unknown | NaN |
| 1184 | Accident | light_conditions | -1 | Data missing or out of range | NaN |
| 1185 | Accident | weather_conditions | 1 | Fine no high winds | NaN |
| 1186 | Accident | weather_conditions | 2 | Raining no high winds | NaN |
| 1187 | Accident | weather_conditions | 3 | Snowing no high winds | NaN |
| 1188 | Accident | weather_conditions | 4 | Fine + high winds | NaN |
| 1189 | Accident | weather_conditions | 5 | Raining + high winds | NaN |
| 1190 | Accident | weather_conditions | 6 | Snowing + high winds | NaN |
| 1191 | Accident | weather_conditions | 7 | Fog or mist | NaN |
| 1192 | Accident | weather_conditions | 8 | Other | NaN |
| 1193 | Accident | weather_conditions | 9 | Unknown | NaN |
| 1194 | Accident | weather_conditions | -1 | Data missing or out of range | NaN |
| 1195 | Accident | road_surface_conditions | 1 | Dry | NaN |
| 1196 | Accident | road_surface_conditions | 2 | Wet or damp | NaN |
| 1197 | Accident | road_surface_conditions | 3 | Snow | NaN |
| 1198 | Accident | road_surface_conditions | 4 | Frost or ice | NaN |
| 1199 | Accident | road_surface_conditions | 5 | Flood over 3cm. deep | NaN |
| 1200 | Accident | road_surface_conditions | 6 | Oil or diesel | NaN |
| 1201 | Accident | road_surface_conditions | 7 | Mud | NaN |
| 1202 | Accident | road_surface_conditions | -1 | Data missing or out of range | NaN |
| 1203 | Accident | road_surface_conditions | 9 | unknown (self reported) | NaN |
| 1204 | Accident | special_conditions_at_site | 0 | None | NaN |
| 1205 | Accident | special_conditions_at_site | 1 | Auto traffic signal - out | NaN |
| 1206 | Accident | special_conditions_at_site | 2 | Auto signal part defective | NaN |
| 1207 | Accident | special_conditions_at_site | 3 | Road sign or marking defective or obscured | NaN |
| 1208 | Accident | special_conditions_at_site | 4 | Roadworks | NaN |
| 1209 | Accident | special_conditions_at_site | 5 | Road surface defective | NaN |
| 1210 | Accident | special_conditions_at_site | 6 | Oil or diesel | NaN |
| 1211 | Accident | special_conditions_at_site | 7 | Mud | NaN |
| 1212 | Accident | special_conditions_at_site | -1 | Data missing or out of range | NaN |
| 1213 | Accident | special_conditions_at_site | 9 | unknown (self reported) | NaN |
| 1214 | Accident | carriageway_hazards | 0 | None | NaN |
| 1215 | Accident | carriageway_hazards | 1 | Vehicle load on road | NaN |
| 1216 | Accident | carriageway_hazards | 2 | Other object on road | NaN |
| 1217 | Accident | carriageway_hazards | 3 | Previous accident | NaN |
| 1218 | Accident | carriageway_hazards | 4 | Dog on road | NaN |
| 1219 | Accident | carriageway_hazards | 5 | Other animal on road | NaN |
| 1220 | Accident | carriageway_hazards | 6 | Pedestrian in carriageway - not injured | NaN |
| 1221 | Accident | carriageway_hazards | 7 | Any animal in carriageway (except ridden horse) | NaN |
| 1222 | Accident | carriageway_hazards | -1 | Data missing or out of range | NaN |
| 1223 | Accident | carriageway_hazards | 9 | unknown (self reported) | NaN |
| 1224 | Accident | urban_or_rural_area | 1 | Urban | field introduced in 1994 |
| 1225 | Accident | urban_or_rural_area | 2 | Rural | field introduced in 1994 |
| 1226 | Accident | urban_or_rural_area | 3 | Unallocated | field introduced in 1994 |
| 1227 | Accident | urban_or_rural_area | -1 | Data missing or out of range | field introduced in 1994 |
| 1228 | Accident | did_police_officer_attend_scene_of_accident | 1 | Yes | NaN |
| 1229 | Accident | did_police_officer_attend_scene_of_accident | 2 | No | NaN |
| 1230 | Accident | did_police_officer_attend_scene_of_accident | 3 | No - accident was reported using a self comple... | NaN |
| 1231 | Accident | did_police_officer_attend_scene_of_accident | -1 | Data missing or out of range | NaN |
| 1232 | Accident | trunk_road_flag | 1 | Trunk (Roads managed by Highways England) | NaN |
| 1233 | Accident | trunk_road_flag | 2 | Non-trunk | NaN |
| 1234 | Accident | trunk_road_flag | -1 | Data missing or out of range | NaN |
| 1235 | Accident | lsoa_of_accident_location | NaN | NaN | England and Wales only. See Office for Nationa... |
In this dataset we observe the features that are of big importance because they contain information about the casualty, and most importantly the severity of damages. Next to that we also have some descriptive feature about the person. And what type of role in traffic they were executing during the incident.
casualty.head()
| accident_year | accident_reference | vehicle_reference | casualty_reference | casualty_class | sex_of_casualty | age_of_casualty | age_band_of_casualty | casualty_severity | pedestrian_location | pedestrian_movement | car_passenger | bus_or_coach_passenger | pedestrian_road_maintenance_worker | casualty_type | casualty_home_area_type | casualty_imd_decile | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| accident_index | |||||||||||||||||
| 2020010219808 | 2020 | 010219808 | 1 | 1 | 3 | 1 | 31 | 6 | 3 | 9 | 5 | 0 | 0 | 0 | 0 | 1 | 4 |
| 2020010220496 | 2020 | 010220496 | 1 | 1 | 3 | 2 | 2 | 1 | 3 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 2 |
| 2020010220496 | 2020 | 010220496 | 1 | 2 | 3 | 2 | 4 | 1 | 3 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 2 |
| 2020010228005 | 2020 | 010228005 | 1 | 1 | 3 | 1 | 23 | 5 | 3 | 5 | 9 | 0 | 0 | 0 | 0 | 1 | 3 |
| 2020010228006 | 2020 | 010228006 | 1 | 1 | 3 | 1 | 47 | 8 | 2 | 4 | 1 | 0 | 0 | 0 | 0 | 1 | 3 |
casualty_ = ref.iloc[1465:]
casualty_
| table | field name | code/format | label | note | |
|---|---|---|---|---|---|
| 1465 | Casualty | accident_index | NaN | NaN | unique value for each accident. The accident_i... |
| 1466 | Casualty | accident_year | NaN | NaN | NaN |
| 1467 | Casualty | accident_reference | NaN | NaN | In year id used by the police to reference a c... |
| 1468 | Casualty | vehicle_reference | NaN | NaN | unique value for each vehicle in a singular ac... |
| 1469 | Casualty | casualty_reference | NaN | NaN | unique value for each casualty in a singular a... |
| 1470 | Casualty | casualty_class | 1 | Driver or rider | NaN |
| 1471 | Casualty | casualty_class | 2 | Passenger | NaN |
| 1472 | Casualty | casualty_class | 3 | Pedestrian | NaN |
| 1473 | Casualty | sex_of_casualty | 1 | Male | NaN |
| 1474 | Casualty | sex_of_casualty | 2 | Female | NaN |
| 1475 | Casualty | sex_of_casualty | 9 | unknown (self reported) | NaN |
| 1476 | Casualty | sex_of_casualty | -1 | Data missing or out of range | NaN |
| 1477 | Casualty | age_of_casualty | NaN | NaN | NaN |
| 1478 | Casualty | age_of_casualty | -1 | Data missing or out of range | NaN |
| 1479 | Casualty | age_band_of_casualty | 1 | 0 - 5 | NaN |
| 1480 | Casualty | age_band_of_casualty | 2 | 6 - 10 | NaN |
| 1481 | Casualty | age_band_of_casualty | 3 | 11 - 15 | NaN |
| 1482 | Casualty | age_band_of_casualty | 4 | 16 - 20 | NaN |
| 1483 | Casualty | age_band_of_casualty | 5 | 21 - 25 | NaN |
| 1484 | Casualty | age_band_of_casualty | 6 | 26 - 35 | NaN |
| 1485 | Casualty | age_band_of_casualty | 7 | 36 - 45 | NaN |
| 1486 | Casualty | age_band_of_casualty | 8 | 46 - 55 | NaN |
| 1487 | Casualty | age_band_of_casualty | 9 | 56 - 65 | NaN |
| 1488 | Casualty | age_band_of_casualty | 10 | 66 - 75 | NaN |
| 1489 | Casualty | age_band_of_casualty | 11 | Over 75 | NaN |
| 1490 | Casualty | age_band_of_casualty | -1 | Data missing or out of range | NaN |
| 1491 | Casualty | casualty_severity | 1 | Fatal | NaN |
| 1492 | Casualty | casualty_severity | 2 | Serious | NaN |
| 1493 | Casualty | casualty_severity | 3 | Slight | NaN |
| 1494 | Casualty | pedestrian_location | 0 | Not a Pedestrian | NaN |
| 1495 | Casualty | pedestrian_location | 1 | Crossing on pedestrian crossing facility | NaN |
| 1496 | Casualty | pedestrian_location | 2 | Crossing in zig-zag approach lines | NaN |
| 1497 | Casualty | pedestrian_location | 3 | Crossing in zig-zag exit lines | NaN |
| 1498 | Casualty | pedestrian_location | 4 | Crossing elsewhere within 50m. of pedestrian c... | NaN |
| 1499 | Casualty | pedestrian_location | 5 | In carriageway, crossing elsewhere | NaN |
| 1500 | Casualty | pedestrian_location | 6 | On footway or verge | NaN |
| 1501 | Casualty | pedestrian_location | 7 | On refuge, central island or central reservation | NaN |
| 1502 | Casualty | pedestrian_location | 8 | In centre of carriageway - not on refuge, isla... | NaN |
| 1503 | Casualty | pedestrian_location | 9 | In carriageway, not crossing | NaN |
| 1504 | Casualty | pedestrian_location | 10 | Unknown or other | NaN |
| 1505 | Casualty | pedestrian_location | -1 | Data missing or out of range | NaN |
| 1506 | Casualty | pedestrian_movement | 0 | Not a Pedestrian | NaN |
| 1507 | Casualty | pedestrian_movement | 1 | Crossing from driver's nearside | NaN |
| 1508 | Casualty | pedestrian_movement | 2 | Crossing from nearside - masked by parked or s... | NaN |
| 1509 | Casualty | pedestrian_movement | 3 | Crossing from driver's offside | NaN |
| 1510 | Casualty | pedestrian_movement | 4 | Crossing from offside - masked by parked or s... | NaN |
| 1511 | Casualty | pedestrian_movement | 5 | In carriageway, stationary - not crossing (st... | NaN |
| 1512 | Casualty | pedestrian_movement | 6 | In carriageway, stationary - not crossing (st... | NaN |
| 1513 | Casualty | pedestrian_movement | 7 | Walking along in carriageway, facing traffic | NaN |
| 1514 | Casualty | pedestrian_movement | 8 | Walking along in carriageway, back to traffic | NaN |
| 1515 | Casualty | pedestrian_movement | 9 | Unknown or other | NaN |
| 1516 | Casualty | pedestrian_movement | -1 | Data missing or out of range | NaN |
| 1517 | Casualty | car_passenger | 0 | Not car passenger | NaN |
| 1518 | Casualty | car_passenger | 1 | Front seat passenger | NaN |
| 1519 | Casualty | car_passenger | 2 | Rear seat passenger | NaN |
| 1520 | Casualty | car_passenger | 9 | unknown (self reported) | NaN |
| 1521 | Casualty | car_passenger | -1 | Data missing or out of range | NaN |
| 1522 | Casualty | bus_or_coach_passenger | 0 | Not a bus or coach passenger | NaN |
| 1523 | Casualty | bus_or_coach_passenger | 1 | Boarding | NaN |
| 1524 | Casualty | bus_or_coach_passenger | 2 | Alighting | NaN |
| 1525 | Casualty | bus_or_coach_passenger | 3 | Standing passenger | NaN |
| 1526 | Casualty | bus_or_coach_passenger | 4 | Seated passenger | NaN |
| 1527 | Casualty | bus_or_coach_passenger | 9 | unknown (self reported) | NaN |
| 1528 | Casualty | bus_or_coach_passenger | -1 | Data missing or out of range | NaN |
| 1529 | Casualty | pedestrian_road_maintenance_worker | 0 | No / Not applicable | NaN |
| 1530 | Casualty | pedestrian_road_maintenance_worker | 1 | Yes | NaN |
| 1531 | Casualty | pedestrian_road_maintenance_worker | 2 | Not Known | NaN |
| 1532 | Casualty | pedestrian_road_maintenance_worker | 3 | Probable | 2005 specification only |
| 1533 | Casualty | pedestrian_road_maintenance_worker | -1 | Data missing or out of range | NaN |
| 1534 | Casualty | casualty_type | 0 | Pedestrian | NaN |
| 1535 | Casualty | casualty_type | 1 | Cyclist | NaN |
| 1536 | Casualty | casualty_type | 2 | Motorcycle 50cc and under rider or passenger | NaN |
| 1537 | Casualty | casualty_type | 3 | Motorcycle 125cc and under rider or passenger | introduced in 1999 specification |
| 1538 | Casualty | casualty_type | 4 | Motorcycle over 125cc and up to 500cc rider or... | introduced in 2005 specification |
| 1539 | Casualty | casualty_type | 5 | Motorcycle over 500cc rider or passenger | introduced in 2005 specification |
| 1540 | Casualty | casualty_type | 8 | Taxi/Private hire car occupant | introduced in 2005 specification |
| 1541 | Casualty | casualty_type | 9 | Car occupant | introduced in 2005 specification |
| 1542 | Casualty | casualty_type | 10 | Minibus (8 - 16 passenger seats) occupant | introduced in 1999 specification |
| 1543 | Casualty | casualty_type | 11 | Bus or coach occupant (17 or more pass seats) | NaN |
| 1544 | Casualty | casualty_type | 16 | Horse rider | introduced in 1999 specification |
| 1545 | Casualty | casualty_type | 17 | Agricultural vehicle occupant | introduced in 1999 specification |
| 1546 | Casualty | casualty_type | 18 | Tram occupant | introduced in 1999 specification |
| 1547 | Casualty | casualty_type | 19 | Van / Goods vehicle (3.5 tonnes mgw or under) ... | NaN |
| 1548 | Casualty | casualty_type | 20 | Goods vehicle (over 3.5t. and under 7.5t.) occ... | introduced in 1999 specification |
| 1549 | Casualty | casualty_type | 21 | Goods vehicle (7.5 tonnes mgw and over) occupant | introduced in 1999 specification |
| 1550 | Casualty | casualty_type | 22 | Mobility scooter rider | introduced in 2011 specification |
| 1551 | Casualty | casualty_type | 23 | Electric motorcycle rider or passenger | introduced in 2011 specification |
| 1552 | Casualty | casualty_type | 90 | Other vehicle occupant | introduced in 2011 specification |
| 1553 | Casualty | casualty_type | 97 | Motorcycle - unknown cc rider or passenger | introduced in 2011 specification |
| 1554 | Casualty | casualty_type | 98 | Goods vehicle (unknown weight) occupant | introduced in 2011 specification |
| 1555 | Casualty | casualty_type | 99 | Unknown vehicle type (self rep only) | introduced in 2011 specification |
| 1556 | Casualty | casualty_type | 103 | Motorcycle - Scooter (1979-1998) | dropped in 1999 specification |
| 1557 | Casualty | casualty_type | 104 | Motorcycle (1979-1998) | dropped in 1999 specification |
| 1558 | Casualty | casualty_type | 105 | Motorcycle - Combination (1979-1998) | dropped in 1999 specification |
| 1559 | Casualty | casualty_type | 106 | Motorcycle over 125cc (1999-2004) | dropped in 2005 specification |
| 1560 | Casualty | casualty_type | 108 | Taxi (excluding private hire cars) (1979-2004) | dropped in 2005 specification |
| 1561 | Casualty | casualty_type | 109 | Car (including private hire cars) (1979-2004) | dropped in 2005 specification |
| 1562 | Casualty | casualty_type | 110 | Minibus/Motor caravan (1979-1998) | dropped in 1999 specification |
| 1563 | Casualty | casualty_type | 113 | Goods over 3.5 tonnes (1979-1998) | dropped in 1999 specification |
| 1564 | Casualty | casualty_imd_decile | 1 | Most deprived 10% | field introduced in 2016 |
| 1565 | Casualty | casualty_imd_decile | 2 | More deprived 10-20% | field introduced in 2016 |
| 1566 | Casualty | casualty_imd_decile | 3 | More deprived 20-30% | field introduced in 2016 |
| 1567 | Casualty | casualty_imd_decile | 4 | More deprived 30-40% | field introduced in 2016 |
| 1568 | Casualty | casualty_imd_decile | 5 | More deprived 40-50% | field introduced in 2016 |
| 1569 | Casualty | casualty_imd_decile | 6 | Less deprived 40-50% | field introduced in 2016 |
| 1570 | Casualty | casualty_imd_decile | 7 | Less deprived 30-40% | field introduced in 2016 |
| 1571 | Casualty | casualty_imd_decile | 8 | Less deprived 20-30% | field introduced in 2016 |
| 1572 | Casualty | casualty_imd_decile | 9 | Less deprived 10-20% | field introduced in 2016 |
| 1573 | Casualty | casualty_imd_decile | 10 | Least deprived 10% | field introduced in 2016 |
| 1574 | Casualty | casualty_imd_decile | -1 | Data missing or out of range | field introduced in 2016 |
| 1575 | Casualty | casualty_home_area_type | 1 | Urban area | field introduced in 1999 |
| 1576 | Casualty | casualty_home_area_type | 2 | Small town | field introduced in 1999 |
| 1577 | Casualty | casualty_home_area_type | 3 | Rural | field introduced in 1999 |
| 1578 | Casualty | casualty_home_area_type | -1 | Data missing or out of range | field introduced in 1999 |
In the vehicle section information about the type and actions can be found. What was the vehicle doing and who was inside? Are questions that can be anwsered with this dataset. These can be linked through the accident_index.
vehicles.head()
| accident_year | accident_reference | vehicle_reference | vehicle_type | towing_and_articulation | vehicle_manoeuvre | vehicle_direction_from | vehicle_direction_to | vehicle_location_restricted_lane | junction_location | skidding_and_overturning | hit_object_in_carriageway | vehicle_leaving_carriageway | hit_object_off_carriageway | first_point_of_impact | vehicle_left_hand_drive | journey_purpose_of_driver | sex_of_driver | age_of_driver | age_band_of_driver | engine_capacity_cc | propulsion_code | age_of_vehicle | generic_make_model | driver_imd_decile | driver_home_area_type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| accident_index | ||||||||||||||||||||||||||
| 2020010219808 | 2020 | 10219808 | 1 | 9 | 9 | 5 | 1 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 9 | 6 | 2 | 32 | 6 | 1968 | 2 | 6 | AUDI Q5 | 4 | 1 |
| 2020010220496 | 2020 | 10220496 | 1 | 9 | 0 | 4 | 2 | 6 | 0 | 2 | 0 | 0 | 0 | 0 | 1 | 1 | 2 | 1 | 45 | 7 | 1395 | 1 | 2 | AUDI A1 | 7 | 1 |
| 2020010228005 | 2020 | 10228005 | 1 | 9 | 0 | 18 | -1 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 6 | 3 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
| 2020010228006 | 2020 | 10228006 | 1 | 8 | 0 | 18 | 1 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 44 | 7 | 1798 | 8 | 8 | TOYOTA PRIUS | 2 | 1 |
| 2020010228011 | 2020 | 10228011 | 1 | 9 | 0 | 18 | 3 | 7 | 9 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 6 | 1 | 20 | 4 | 2993 | 2 | 4 | BMW 4 SERIES | -1 | -1 |
vehicles_ = ref.iloc[1236:1459]
vehicles_
| table | field name | code/format | label | note | |
|---|---|---|---|---|---|
| 1236 | Vehicle | accident_index | NaN | NaN | unique value for each accident. The accident_i... |
| 1237 | Vehicle | accident_year | NaN | NaN | NaN |
| 1238 | Vehicle | accident_reference | NaN | NaN | In year id used by the police to reference a c... |
| 1239 | Vehicle | vehicle_reference | NaN | NaN | unique value for each vehicle in a singular ac... |
| 1240 | Vehicle | vehicle_type | 1 | Pedal cycle | NaN |
| 1241 | Vehicle | vehicle_type | 2 | Motorcycle 50cc and under | NaN |
| 1242 | Vehicle | vehicle_type | 3 | Motorcycle 125cc and under | category introduced in 1999 specification |
| 1243 | Vehicle | vehicle_type | 4 | Motorcycle over 125cc and up to 500cc | category introduced in 2005 specification |
| 1244 | Vehicle | vehicle_type | 5 | Motorcycle over 500cc | category introduced in 2005 specification |
| 1245 | Vehicle | vehicle_type | 8 | Taxi/Private hire car | category introduced in 2005 specification |
| 1246 | Vehicle | vehicle_type | 9 | Car | category introduced in 2005 specification |
| 1247 | Vehicle | vehicle_type | 10 | Minibus (8 - 16 passenger seats) | category introduced in 1999 specification |
| 1248 | Vehicle | vehicle_type | 11 | Bus or coach (17 or more pass seats) | NaN |
| 1249 | Vehicle | vehicle_type | 16 | Ridden horse | category introduced in 1999 specification |
| 1250 | Vehicle | vehicle_type | 17 | Agricultural vehicle | category introduced in 1999 specification |
| 1251 | Vehicle | vehicle_type | 18 | Tram | category introduced in 1999 specification |
| 1252 | Vehicle | vehicle_type | 19 | Van / Goods 3.5 tonnes mgw or under | NaN |
| 1253 | Vehicle | vehicle_type | 20 | Goods over 3.5t. and under 7.5t | category introduced in 1999 specification |
| 1254 | Vehicle | vehicle_type | 21 | Goods 7.5 tonnes mgw and over | category introduced in 1999 specification |
| 1255 | Vehicle | vehicle_type | 22 | Mobility scooter | cateogry introduced in 2011 specification |
| 1256 | Vehicle | vehicle_type | 23 | Electric motorcycle | cateogry introduced in 2011 specification |
| 1257 | Vehicle | vehicle_type | 90 | Other vehicle | cateogry introduced in 2011 specification |
| 1258 | Vehicle | vehicle_type | 97 | Motorcycle - unknown cc | cateogry introduced in 2011 specification |
| 1259 | Vehicle | vehicle_type | 98 | Goods vehicle - unknown weight | cateogry introduced in 2011 specification |
| 1260 | Vehicle | vehicle_type | 99 | Unknown vehicle type (self rep only) | cateogry introduced in 2011 specification |
| 1261 | Vehicle | vehicle_type | 103 | Motorcycle - Scooter (1979-1998) | cateogory discontinued in 1999 specification |
| 1262 | Vehicle | vehicle_type | 104 | Motorcycle (1979-1998) | cateogory discontinued in 1999 specification |
| 1263 | Vehicle | vehicle_type | 105 | Motorcycle - Combination (1979-1998) | cateogory discontinued in 1999 specification |
| 1264 | Vehicle | vehicle_type | 106 | Motorcycle over 125cc (1999-2004) | cateogory discontinued in 2005 specification |
| 1265 | Vehicle | vehicle_type | 108 | Taxi (excluding private hire cars) (1979-2004) | cateogory discontinued in 2005 specification |
| 1266 | Vehicle | vehicle_type | 109 | Car (including private hire cars) (1979-2004) | cateogory discontinued in 2005 specification |
| 1267 | Vehicle | vehicle_type | 110 | Minibus/Motor caravan (1979-1998) | cateogory discontinued in 1999 specification |
| 1268 | Vehicle | vehicle_type | 113 | Goods over 3.5 tonnes (1979-1998) | cateogory discontinued in 1999 specification |
| 1269 | Vehicle | vehicle_type | -1 | Data missing or out of range | NaN |
| 1270 | Vehicle | towing_and_articulation | 0 | No tow/articulation | NaN |
| 1271 | Vehicle | towing_and_articulation | 1 | Articulated vehicle | NaN |
| 1272 | Vehicle | towing_and_articulation | 2 | Double or multiple trailer | NaN |
| 1273 | Vehicle | towing_and_articulation | 3 | Caravan | NaN |
| 1274 | Vehicle | towing_and_articulation | 4 | Single trailer | NaN |
| 1275 | Vehicle | towing_and_articulation | 5 | Other tow | NaN |
| 1276 | Vehicle | towing_and_articulation | 9 | unknown (self reported) | NaN |
| 1277 | Vehicle | towing_and_articulation | -1 | Data missing or out of range | NaN |
| 1278 | Vehicle | vehicle_manoeuvre | 1 | Reversing | NaN |
| 1279 | Vehicle | vehicle_manoeuvre | 2 | Parked | NaN |
| 1280 | Vehicle | vehicle_manoeuvre | 3 | Waiting to go - held up | NaN |
| 1281 | Vehicle | vehicle_manoeuvre | 4 | Slowing or stopping | NaN |
| 1282 | Vehicle | vehicle_manoeuvre | 5 | Moving off | NaN |
| 1283 | Vehicle | vehicle_manoeuvre | 6 | U-turn | NaN |
| 1284 | Vehicle | vehicle_manoeuvre | 7 | Turning left | NaN |
| 1285 | Vehicle | vehicle_manoeuvre | 8 | Waiting to turn left | NaN |
| 1286 | Vehicle | vehicle_manoeuvre | 9 | Turning right | NaN |
| 1287 | Vehicle | vehicle_manoeuvre | 10 | Waiting to turn right | NaN |
| 1288 | Vehicle | vehicle_manoeuvre | 11 | Changing lane to left | NaN |
| 1289 | Vehicle | vehicle_manoeuvre | 12 | Changing lane to right | NaN |
| 1290 | Vehicle | vehicle_manoeuvre | 13 | Overtaking moving vehicle - offside | NaN |
| 1291 | Vehicle | vehicle_manoeuvre | 14 | Overtaking static vehicle - offside | NaN |
| 1292 | Vehicle | vehicle_manoeuvre | 15 | Overtaking - nearside | NaN |
| 1293 | Vehicle | vehicle_manoeuvre | 16 | Going ahead left-hand bend | NaN |
| 1294 | Vehicle | vehicle_manoeuvre | 17 | Going ahead right-hand bend | NaN |
| 1295 | Vehicle | vehicle_manoeuvre | 18 | Going ahead other | NaN |
| 1296 | Vehicle | vehicle_manoeuvre | 99 | unknown (self reported) | NaN |
| 1297 | Vehicle | vehicle_manoeuvre | -1 | Data missing or out of range | NaN |
| 1298 | Vehicle | vehicle_direction_from | 0 | Parked | both vehicle_direction_from and vehicle_direct... |
| 1299 | Vehicle | vehicle_direction_from | 1 | North | NaN |
| 1300 | Vehicle | vehicle_direction_from | 2 | North East | NaN |
| 1301 | Vehicle | vehicle_direction_from | 3 | East | NaN |
| 1302 | Vehicle | vehicle_direction_from | 4 | South East | NaN |
| 1303 | Vehicle | vehicle_direction_from | 5 | South East | NaN |
| 1304 | Vehicle | vehicle_direction_from | 6 | South West | NaN |
| 1305 | Vehicle | vehicle_direction_from | 7 | West | NaN |
| 1306 | Vehicle | vehicle_direction_from | 8 | North West | NaN |
| 1307 | Vehicle | vehicle_direction_from | 9 | unknown (self reported) | both vehicle_direction_from and vehicle_direct... |
| 1308 | Vehicle | vehicle_direction_to | 0 | Parked | both vehicle_direction_from and vehicle_direct... |
| 1309 | Vehicle | vehicle_direction_to | 1 | North | NaN |
| 1310 | Vehicle | vehicle_direction_to | 2 | North East | NaN |
| 1311 | Vehicle | vehicle_direction_to | 3 | East | NaN |
| 1312 | Vehicle | vehicle_direction_to | 4 | South East | NaN |
| 1313 | Vehicle | vehicle_direction_to | 5 | South East | NaN |
| 1314 | Vehicle | vehicle_direction_to | 6 | South West | NaN |
| 1315 | Vehicle | vehicle_direction_to | 7 | West | NaN |
| 1316 | Vehicle | vehicle_direction_to | 8 | North West | NaN |
| 1317 | Vehicle | vehicle_direction_to | 9 | unknown (self reported) | both vehicle_direction_from and vehicle_direct... |
| 1318 | Vehicle | vehicle_location_restricted_lane | 0 | On main c'way - not in restricted lane | NaN |
| 1319 | Vehicle | vehicle_location_restricted_lane | 1 | Tram/Light rail track | NaN |
| 1320 | Vehicle | vehicle_location_restricted_lane | 2 | Bus lane | NaN |
| 1321 | Vehicle | vehicle_location_restricted_lane | 3 | Busway (including guided busway) | NaN |
| 1322 | Vehicle | vehicle_location_restricted_lane | 4 | Cycle lane (on main carriageway) | NaN |
| 1323 | Vehicle | vehicle_location_restricted_lane | 5 | Cycleway or shared use footway (not part of m... | NaN |
| 1324 | Vehicle | vehicle_location_restricted_lane | 6 | On lay-by or hard shoulder | NaN |
| 1325 | Vehicle | vehicle_location_restricted_lane | 7 | Entering lay-by or hard shoulder | NaN |
| 1326 | Vehicle | vehicle_location_restricted_lane | 8 | Leaving lay-by or hard shoulder | NaN |
| 1327 | Vehicle | vehicle_location_restricted_lane | 9 | Footway (pavement) | NaN |
| 1328 | Vehicle | vehicle_location_restricted_lane | 10 | Not on carriageway | NaN |
| 1329 | Vehicle | vehicle_location_restricted_lane | 99 | unknown (self reported) | NaN |
| 1330 | Vehicle | vehicle_location_restricted_lane | -1 | Data missing or out of range | NaN |
| 1331 | Vehicle | junction_location | 0 | Not at or within 20 metres of junction | NaN |
| 1332 | Vehicle | junction_location | 1 | Approaching junction or waiting/parked at junc... | NaN |
| 1333 | Vehicle | junction_location | 2 | Cleared junction or waiting/parked at junction... | NaN |
| 1334 | Vehicle | junction_location | 3 | Leaving roundabout | NaN |
| 1335 | Vehicle | junction_location | 4 | Entering roundabout | NaN |
| 1336 | Vehicle | junction_location | 5 | Leaving main road | NaN |
| 1337 | Vehicle | junction_location | 6 | Entering main road | NaN |
| 1338 | Vehicle | junction_location | 7 | Entering from slip road | NaN |
| 1339 | Vehicle | junction_location | 8 | Mid Junction - on roundabout or on main road | NaN |
| 1340 | Vehicle | junction_location | 9 | unknown (self reported) | NaN |
| 1341 | Vehicle | junction_location | -1 | Data missing or out of range | NaN |
| 1342 | Vehicle | skidding_and_overturning | 0 | None | NaN |
| 1343 | Vehicle | skidding_and_overturning | 1 | Skidded | NaN |
| 1344 | Vehicle | skidding_and_overturning | 2 | Skidded and overturned | NaN |
| 1345 | Vehicle | skidding_and_overturning | 3 | Jackknifed | NaN |
| 1346 | Vehicle | skidding_and_overturning | 4 | Jackknifed and overturned | NaN |
| 1347 | Vehicle | skidding_and_overturning | 5 | Overturned | NaN |
| 1348 | Vehicle | skidding_and_overturning | 9 | unknown (self reported) | NaN |
| 1349 | Vehicle | skidding_and_overturning | -1 | Data missing or out of range | NaN |
| 1350 | Vehicle | hit_object_in_carriageway | 0 | None | NaN |
| 1351 | Vehicle | hit_object_in_carriageway | 1 | Previous accident | NaN |
| 1352 | Vehicle | hit_object_in_carriageway | 2 | Road works | NaN |
| 1353 | Vehicle | hit_object_in_carriageway | 4 | Parked vehicle | NaN |
| 1354 | Vehicle | hit_object_in_carriageway | 5 | Bridge (roof) | NaN |
| 1355 | Vehicle | hit_object_in_carriageway | 6 | Bridge (side) | NaN |
| 1356 | Vehicle | hit_object_in_carriageway | 7 | Bollard or refuge | NaN |
| 1357 | Vehicle | hit_object_in_carriageway | 8 | Open door of vehicle | NaN |
| 1358 | Vehicle | hit_object_in_carriageway | 9 | Central island of roundabout | NaN |
| 1359 | Vehicle | hit_object_in_carriageway | 10 | Kerb | NaN |
| 1360 | Vehicle | hit_object_in_carriageway | 11 | Other object | NaN |
| 1361 | Vehicle | hit_object_in_carriageway | 12 | Any animal (except ridden horse) | NaN |
| 1362 | Vehicle | hit_object_in_carriageway | 99 | unknown (self reported) | NaN |
| 1363 | Vehicle | hit_object_in_carriageway | -1 | Data missing or out of range | NaN |
| 1364 | Vehicle | vehicle_leaving_carriageway | 0 | Did not leave carriageway | NaN |
| 1365 | Vehicle | vehicle_leaving_carriageway | 1 | Nearside | NaN |
| 1366 | Vehicle | vehicle_leaving_carriageway | 2 | Nearside and rebounded | NaN |
| 1367 | Vehicle | vehicle_leaving_carriageway | 3 | Straight ahead at junction | NaN |
| 1368 | Vehicle | vehicle_leaving_carriageway | 4 | Offside on to central reservation | NaN |
| 1369 | Vehicle | vehicle_leaving_carriageway | 5 | Offside on to centrl res + rebounded | NaN |
| 1370 | Vehicle | vehicle_leaving_carriageway | 6 | Offside - crossed central reservation | NaN |
| 1371 | Vehicle | vehicle_leaving_carriageway | 7 | Offside | NaN |
| 1372 | Vehicle | vehicle_leaving_carriageway | 8 | Offside and rebounded | NaN |
| 1373 | Vehicle | vehicle_leaving_carriageway | 9 | unknown (self reported) | NaN |
| 1374 | Vehicle | vehicle_leaving_carriageway | -1 | Data missing or out of range | NaN |
| 1375 | Vehicle | hit_object_off_carriageway | 0 | None | NaN |
| 1376 | Vehicle | hit_object_off_carriageway | 1 | Road sign or traffic signal | NaN |
| 1377 | Vehicle | hit_object_off_carriageway | 2 | Lamp post | NaN |
| 1378 | Vehicle | hit_object_off_carriageway | 3 | Telegraph or electricity pole | NaN |
| 1379 | Vehicle | hit_object_off_carriageway | 4 | Tree | NaN |
| 1380 | Vehicle | hit_object_off_carriageway | 5 | Bus stop or bus shelter | NaN |
| 1381 | Vehicle | hit_object_off_carriageway | 6 | Central crash barrier | NaN |
| 1382 | Vehicle | hit_object_off_carriageway | 7 | Near/Offside crash barrier | NaN |
| 1383 | Vehicle | hit_object_off_carriageway | 8 | Submerged in water | NaN |
| 1384 | Vehicle | hit_object_off_carriageway | 9 | Entered ditch | NaN |
| 1385 | Vehicle | hit_object_off_carriageway | 10 | Other permanent object | NaN |
| 1386 | Vehicle | hit_object_off_carriageway | 11 | Wall or fence | NaN |
| 1387 | Vehicle | hit_object_off_carriageway | 99 | unknown (self reported) | NaN |
| 1388 | Vehicle | hit_object_off_carriageway | -1 | Data missing or out of range | NaN |
| 1389 | Vehicle | first_point_of_impact | 0 | Did not impact | NaN |
| 1390 | Vehicle | first_point_of_impact | 1 | Front | NaN |
| 1391 | Vehicle | first_point_of_impact | 2 | Back | NaN |
| 1392 | Vehicle | first_point_of_impact | 3 | Offside | NaN |
| 1393 | Vehicle | first_point_of_impact | 4 | Nearside | NaN |
| 1394 | Vehicle | first_point_of_impact | 9 | unknown (self reported) | NaN |
| 1395 | Vehicle | first_point_of_impact | -1 | Data missing or out of range | NaN |
| 1396 | Vehicle | vehicle_left_hand_drive | 1 | No | NaN |
| 1397 | Vehicle | vehicle_left_hand_drive | 2 | Yes | NaN |
| 1398 | Vehicle | vehicle_left_hand_drive | 9 | Unknown | NaN |
| 1399 | Vehicle | vehicle_left_hand_drive | -1 | Data missing or out of range | NaN |
| 1400 | Vehicle | journey_purpose_of_driver | 1 | Journey as part of work | NaN |
| 1401 | Vehicle | journey_purpose_of_driver | 2 | Commuting to/from work | NaN |
| 1402 | Vehicle | journey_purpose_of_driver | 3 | Taking pupil to/from school | NaN |
| 1403 | Vehicle | journey_purpose_of_driver | 4 | Pupil riding to/from school | NaN |
| 1404 | Vehicle | journey_purpose_of_driver | 5 | Other | NaN |
| 1405 | Vehicle | journey_purpose_of_driver | 6 | Not known | NaN |
| 1406 | Vehicle | journey_purpose_of_driver | 15 | Other/Not known | 2005 specification only |
| 1407 | Vehicle | journey_purpose_of_driver | -1 | Data missing or out of range | NaN |
| 1408 | Vehicle | sex_of_driver | 1 | Male | NaN |
| 1409 | Vehicle | sex_of_driver | 2 | Female | NaN |
| 1410 | Vehicle | sex_of_driver | 3 | Not known | NaN |
| 1411 | Vehicle | sex_of_driver | -1 | Data missing or out of range | NaN |
| 1412 | Vehicle | age_of_driver | NaN | NaN | NaN |
| 1413 | Vehicle | age_of_driver | -1 | Data missing or out of range | NaN |
| 1414 | Vehicle | age_band_of_driver | 1 | 0 - 5 | NaN |
| 1415 | Vehicle | age_band_of_driver | 2 | 6 - 10 | NaN |
| 1416 | Vehicle | age_band_of_driver | 3 | 11 - 15 | NaN |
| 1417 | Vehicle | age_band_of_driver | 4 | 16 - 20 | NaN |
| 1418 | Vehicle | age_band_of_driver | 5 | 21 - 25 | NaN |
| 1419 | Vehicle | age_band_of_driver | 6 | 26 - 35 | NaN |
| 1420 | Vehicle | age_band_of_driver | 7 | 36 - 45 | NaN |
| 1421 | Vehicle | age_band_of_driver | 8 | 46 - 55 | NaN |
| 1422 | Vehicle | age_band_of_driver | 9 | 56 - 65 | NaN |
| 1423 | Vehicle | age_band_of_driver | 10 | 66 - 75 | NaN |
| 1424 | Vehicle | age_band_of_driver | 11 | Over 75 | NaN |
| 1425 | Vehicle | age_band_of_driver | -1 | Data missing or out of range | NaN |
| 1426 | Vehicle | engine_capacity_cc | NaN | NaN | NaN |
| 1427 | Vehicle | engine_capacity_cc | -1 | Data missing or out of range | NaN |
| 1428 | Vehicle | propulsion_code | 1 | Petrol | NaN |
| 1429 | Vehicle | propulsion_code | 2 | Heavy oil | NaN |
| 1430 | Vehicle | propulsion_code | 3 | Electric | NaN |
| 1431 | Vehicle | propulsion_code | 4 | Steam | NaN |
| 1432 | Vehicle | propulsion_code | 5 | Gas | NaN |
| 1433 | Vehicle | propulsion_code | 6 | Petrol/Gas (LPG) | NaN |
| 1434 | Vehicle | propulsion_code | 7 | Gas/Bi-fuel | NaN |
| 1435 | Vehicle | propulsion_code | 8 | Hybrid electric | NaN |
| 1436 | Vehicle | propulsion_code | 9 | Gas Diesel | NaN |
| 1437 | Vehicle | propulsion_code | 10 | New fuel technology | NaN |
| 1438 | Vehicle | propulsion_code | 11 | Fuel cells | NaN |
| 1439 | Vehicle | propulsion_code | 12 | Electric diesel | NaN |
| 1440 | Vehicle | propulsion_code | -1 | Undefined | NaN |
| 1441 | Vehicle | age_of_vehicle | NaN | NaN | NaN |
| 1442 | Vehicle | generic_make_model | NaN | NaN | field introduced in 2020 |
| 1443 | Vehicle | generic_make_model | -1 | Data missing or out of range | field introduced in 2020 |
| 1444 | Vehicle | driver_imd_decile | 1 | Most deprived 10% | field introduced in 2016 |
| 1445 | Vehicle | driver_imd_decile | 2 | More deprived 10-20% | field introduced in 2016 |
| 1446 | Vehicle | driver_imd_decile | 3 | More deprived 20-30% | field introduced in 2016 |
| 1447 | Vehicle | driver_imd_decile | 4 | More deprived 30-40% | field introduced in 2016 |
| 1448 | Vehicle | driver_imd_decile | 5 | More deprived 40-50% | field introduced in 2016 |
| 1449 | Vehicle | driver_imd_decile | 6 | Less deprived 40-50% | field introduced in 2016 |
| 1450 | Vehicle | driver_imd_decile | 7 | Less deprived 30-40% | field introduced in 2016 |
| 1451 | Vehicle | driver_imd_decile | 8 | Less deprived 20-30% | field introduced in 2016 |
| 1452 | Vehicle | driver_imd_decile | 9 | Less deprived 10-20% | field introduced in 2016 |
| 1453 | Vehicle | driver_imd_decile | 10 | Least deprived 10% | field introduced in 2016 |
| 1454 | Vehicle | driver_imd_decile | -1 | Data missing or out of range | field introduced in 2016 |
| 1455 | Vehicle | driver_home_area_type | 1 | Urban area | field introduced in 1999 |
| 1456 | Vehicle | driver_home_area_type | 2 | Small town | field introduced in 1999 |
| 1457 | Vehicle | driver_home_area_type | 3 | Rural | field introduced in 1999 |
| 1458 | Vehicle | driver_home_area_type | -1 | Data missing or out of range | field introduced in 1999 |
Before we start doing any actual work of displaying and linking data we can easily check for missing datapoints or features that can be excluded.
Only if it can be presumed they would not have any significant impact in the accuracy of the model.
a = pd.DataFrame(np.transpose(np.array((characteristics.columns,round(characteristics.isin([-1, '-1']).sum()/characteristics.shape[0]*100,2)),dtype=object,)),columns=['features','missing_rate'])
b = pd.DataFrame(np.transpose(np.array((vehicles.columns,round(vehicles.isin([-1, '-1']).sum()/vehicles.shape[0]*100,2)),dtype=object,)),columns=['features','missing_rate'])
c = pd.DataFrame(np.transpose(np.array((casualty.columns,round(casualty.isin([-1, '-1']).sum()/casualty.shape[0]*100,2)),dtype=object,)),columns=['features','missing_rate'])
def highlight_greaterthan(x):
if x.missing_rate > 80:
return ['background-color: #FFCECE']*2
if x.missing_rate > 40:
return ['background-color: #FFE9CE']*2
if x.missing_rate > 5:
return ['background-color: #FFFECE']*2
else:
return ['background-color: #CEFFFC']*2
a = a.style.apply(highlight_greaterthan, axis=1).set_table_styles([{
'selector': 'caption',
'props': [
('color', '#585858'),
('font-size', '30px')
]
}])
b = b.style.apply(highlight_greaterthan, axis=1).set_table_styles([{
'selector': 'caption',
'props': [
('color', '#585858'),
('font-size', '30px')
]
}])
c = c.style.apply(highlight_greaterthan, axis=1).set_table_styles([{
'selector': 'caption',
'props': [
('color', '#585858'),
('font-size', '30px')
]
}])
a_styler = a.set_table_attributes("style='display:inline'").set_caption('characteristics')
b_styler = b.set_table_attributes("style='display:inline'").set_caption('vehicles')
c_styler = c.set_table_attributes("style='display:inline'").set_caption('casualty')
space = "\xa0" * 50
display_html(a_styler._repr_html_() + space + b_styler._repr_html_() + space + c_styler._repr_html_() + space, raw=True)
| features | missing_rate | |
|---|---|---|
| 0 | accident_year | 0.000000 |
| 1 | accident_reference | 0.000000 |
| 2 | location_easting_osgr | 0.000000 |
| 3 | location_northing_osgr | 0.000000 |
| 4 | longitude | 0.000000 |
| 5 | latitude | 0.000000 |
| 6 | police_force | 0.000000 |
| 7 | accident_severity | 0.000000 |
| 8 | number_of_vehicles | 0.000000 |
| 9 | number_of_casualties | 0.000000 |
| 10 | date | 0.000000 |
| 11 | day_of_week | 0.000000 |
| 12 | time | 0.000000 |
| 13 | local_authority_district | 1.090000 |
| 14 | local_authority_ons_district | 0.000000 |
| 15 | local_authority_highway | 0.000000 |
| 16 | first_road_class | 0.000000 |
| 17 | first_road_number | 0.000000 |
| 18 | road_type | 0.000000 |
| 19 | speed_limit | 0.010000 |
| 20 | junction_detail | 0.000000 |
| 21 | junction_control | 41.990000 |
| 22 | second_road_class | 0.000000 |
| 23 | second_road_number | 0.010000 |
| 24 | pedestrian_crossing_human_control | 0.160000 |
| 25 | pedestrian_crossing_physical_facilities | 0.150000 |
| 26 | light_conditions | 0.000000 |
| 27 | weather_conditions | 0.000000 |
| 28 | road_surface_conditions | 0.350000 |
| 29 | special_conditions_at_site | 0.240000 |
| 30 | carriageway_hazards | 0.230000 |
| 31 | urban_or_rural_area | 0.000000 |
| 32 | did_police_officer_attend_scene_of_accident | 0.000000 |
| 33 | trunk_road_flag | 7.360000 |
| 34 | lsoa_of_accident_location | 4.220000 |
| features | missing_rate | |
|---|---|---|
| 0 | accident_year | 0.000000 |
| 1 | accident_reference | 0.000000 |
| 2 | vehicle_reference | 0.000000 |
| 3 | vehicle_type | 0.000000 |
| 4 | towing_and_articulation | 0.410000 |
| 5 | vehicle_manoeuvre | 0.410000 |
| 6 | vehicle_direction_from | 0.980000 |
| 7 | vehicle_direction_to | 0.990000 |
| 8 | vehicle_location_restricted_lane | 0.380000 |
| 9 | junction_location | 0.170000 |
| 10 | skidding_and_overturning | 0.400000 |
| 11 | hit_object_in_carriageway | 0.380000 |
| 12 | vehicle_leaving_carriageway | 0.380000 |
| 13 | hit_object_off_carriageway | 0.000000 |
| 14 | first_point_of_impact | 0.560000 |
| 15 | vehicle_left_hand_drive | 0.520000 |
| 16 | journey_purpose_of_driver | 0.110000 |
| 17 | sex_of_driver | 0.010000 |
| 18 | age_of_driver | 13.950000 |
| 19 | age_band_of_driver | 13.950000 |
| 20 | engine_capacity_cc | 26.050000 |
| 21 | propulsion_code | 25.700000 |
| 22 | age_of_vehicle | 25.730000 |
| 23 | generic_make_model | 28.460000 |
| 24 | driver_imd_decile | 18.760000 |
| 25 | driver_home_area_type | 18.650000 |
| features | missing_rate | |
|---|---|---|
| 0 | accident_year | 0.000000 |
| 1 | accident_reference | 0.000000 |
| 2 | vehicle_reference | 0.000000 |
| 3 | casualty_reference | 0.000000 |
| 4 | casualty_class | 0.000000 |
| 5 | sex_of_casualty | 0.650000 |
| 6 | age_of_casualty | 2.150000 |
| 7 | age_band_of_casualty | 2.150000 |
| 8 | casualty_severity | 0.000000 |
| 9 | pedestrian_location | 0.000000 |
| 10 | pedestrian_movement | 0.000000 |
| 11 | car_passenger | 0.270000 |
| 12 | bus_or_coach_passenger | 0.020000 |
| 13 | pedestrian_road_maintenance_worker | 0.080000 |
| 14 | casualty_type | 0.000000 |
| 15 | casualty_home_area_type | 9.310000 |
| 16 | casualty_imd_decile | 9.440000 |
From this we can observe some irregularities:
Characteristics
junction_control has 41% null values. Can be excluded as it is a additive description to junction_detail which would be considered enoughtrunk_road_flag has 7% null values. Concerns by who the road network is managed that the event took place at. This can be excluded as it is another added descriptive feature.Vehicle
age_of_driver has 14% null values. For most of these cases they fall around the same percentile area. This could be because of the previously described problem in the proposal: Fleeing of the sceneage_band_of_driver has 14% null values. This is directly linked to age_of_driverengine_capacity_cc has 26% null values. It seems that in some circumstances some descriptive features from the vehicle were left empty. same accounts for nuber 4, 5, 6, 7, 8 on this list. This is could have a link but it is not sure because the other features are present. For now these do not account for a big part so we will leave them in.propulsion_code has 25% null values. age_of_vehicle has 25% null values.generic_make_model has 28% null values.driver_imd_decile has 18% null values.driver_home_area_type has 18% null values. Casualty
casualty_home_area_type has 9% null values. This can be directly related to the previous cases. The might be specific circumstances in which these cannot be applied to the current situation. Or they fall in the own specific category for which there is no law enforcement applicable. These are small cases but we need more insight to specify wether they can be excluded.casualty_imd_decile has 9% null values.In order to see if the data is usable for modeling we need to check for the types. As most classifying algorithms take float or int like values.
characteristics_dtypes = pd.DataFrame(np.transpose(np.array((characteristics.columns,characteristics.dtypes),dtype=object,)),columns=['features','dtype'])
vehicles_dtypes = pd.DataFrame(np.transpose(np.array((vehicles.columns,vehicles.dtypes),dtype=object,)),columns=['features','dtype'])
casualty_dtypes = pd.DataFrame(np.transpose(np.array((casualty.columns,casualty.dtypes),dtype=object,)),columns=['features','dtype'])
characteristics_dtypes = characteristics_dtypes.style.set_table_styles([{
'selector': 'caption',
'props': [
('color', '#585858'),
('font-size', '30px')
]
}])
vehicles_dtypes = vehicles_dtypes.style.set_table_styles([{
'selector': 'caption',
'props': [
('color', '#585858'),
('font-size', '30px')
]
}])
casualty_dtypes = casualty_dtypes.style.set_table_styles([{
'selector': 'caption',
'props': [
('color', '#585858'),
('font-size', '30px')
]
}])
characteristics_dtypes_styler = characteristics_dtypes.set_table_attributes("style='display:inline'").set_caption('characteristics')
vehicles_dtypes_styler = vehicles_dtypes.set_table_attributes("style='display:inline'").set_caption('vehicles')
casualty_dtypes_styler = casualty_dtypes.set_table_attributes("style='display:inline'").set_caption('casualty')
space = "\xa0" * 50
display_html(characteristics_dtypes_styler._repr_html_() + space + vehicles_dtypes_styler._repr_html_() + space +
casualty_dtypes_styler._repr_html_() + space, raw=True)
| features | dtype | |
|---|---|---|
| 0 | accident_year | int64 |
| 1 | accident_reference | object |
| 2 | location_easting_osgr | float64 |
| 3 | location_northing_osgr | float64 |
| 4 | longitude | float64 |
| 5 | latitude | float64 |
| 6 | police_force | int64 |
| 7 | accident_severity | int64 |
| 8 | number_of_vehicles | int64 |
| 9 | number_of_casualties | int64 |
| 10 | date | object |
| 11 | day_of_week | int64 |
| 12 | time | object |
| 13 | local_authority_district | int64 |
| 14 | local_authority_ons_district | object |
| 15 | local_authority_highway | object |
| 16 | first_road_class | int64 |
| 17 | first_road_number | int64 |
| 18 | road_type | int64 |
| 19 | speed_limit | int64 |
| 20 | junction_detail | int64 |
| 21 | junction_control | int64 |
| 22 | second_road_class | int64 |
| 23 | second_road_number | int64 |
| 24 | pedestrian_crossing_human_control | int64 |
| 25 | pedestrian_crossing_physical_facilities | int64 |
| 26 | light_conditions | int64 |
| 27 | weather_conditions | int64 |
| 28 | road_surface_conditions | int64 |
| 29 | special_conditions_at_site | int64 |
| 30 | carriageway_hazards | int64 |
| 31 | urban_or_rural_area | int64 |
| 32 | did_police_officer_attend_scene_of_accident | int64 |
| 33 | trunk_road_flag | int64 |
| 34 | lsoa_of_accident_location | object |
| features | dtype | |
|---|---|---|
| 0 | accident_year | int64 |
| 1 | accident_reference | object |
| 2 | vehicle_reference | int64 |
| 3 | vehicle_type | int64 |
| 4 | towing_and_articulation | int64 |
| 5 | vehicle_manoeuvre | int64 |
| 6 | vehicle_direction_from | int64 |
| 7 | vehicle_direction_to | int64 |
| 8 | vehicle_location_restricted_lane | int64 |
| 9 | junction_location | int64 |
| 10 | skidding_and_overturning | int64 |
| 11 | hit_object_in_carriageway | int64 |
| 12 | vehicle_leaving_carriageway | int64 |
| 13 | hit_object_off_carriageway | int64 |
| 14 | first_point_of_impact | int64 |
| 15 | vehicle_left_hand_drive | int64 |
| 16 | journey_purpose_of_driver | int64 |
| 17 | sex_of_driver | int64 |
| 18 | age_of_driver | int64 |
| 19 | age_band_of_driver | int64 |
| 20 | engine_capacity_cc | int64 |
| 21 | propulsion_code | int64 |
| 22 | age_of_vehicle | int64 |
| 23 | generic_make_model | object |
| 24 | driver_imd_decile | int64 |
| 25 | driver_home_area_type | int64 |
| features | dtype | |
|---|---|---|
| 0 | accident_year | int64 |
| 1 | accident_reference | object |
| 2 | vehicle_reference | int64 |
| 3 | casualty_reference | int64 |
| 4 | casualty_class | int64 |
| 5 | sex_of_casualty | int64 |
| 6 | age_of_casualty | int64 |
| 7 | age_band_of_casualty | int64 |
| 8 | casualty_severity | int64 |
| 9 | pedestrian_location | int64 |
| 10 | pedestrian_movement | int64 |
| 11 | car_passenger | int64 |
| 12 | bus_or_coach_passenger | int64 |
| 13 | pedestrian_road_maintenance_worker | int64 |
| 14 | casualty_type | int64 |
| 15 | casualty_home_area_type | int64 |
| 16 | casualty_imd_decile | int64 |
As we can see most data is float or int. This is because the data is already categorized and can be decoded with the reference table. Yet there are some Objects left. We need to inspect these and the other features to check if they can be used for modeling.
In order to find usable features we need to observe the distribution. We can pick out points of interest and furtherhly display these with the reference.
First we check what kinds of objects are in each dataset.
for col in characteristics.select_dtypes('object'):
print('/n')
print('Number of values in "', col, '"', {characteristics[col].nunique()})
print(characteristics[col].unique())
print('/n')
print('------------------------------------------------')
/n
Number of values in " accident_reference " {91199}
[10219808 10220496 10228005 ... '991030297' '991030900' '991032575']
/n
------------------------------------------------
/n
Number of values in " date " {366}
['04/02/2020' '27/04/2020' '01/01/2020' '02/01/2020' '03/01/2020'
'04/01/2020' '05/01/2020' '06/01/2020' '07/01/2020' '08/01/2020'
'09/01/2020' '10/01/2020' '11/01/2020' '12/01/2020' '13/01/2020'
'14/01/2020' '15/01/2020' '16/01/2020' '17/01/2020' '18/01/2020'
'19/01/2020' '20/01/2020' '21/01/2020' '22/01/2020' '23/01/2020'
'24/01/2020' '25/01/2020' '26/01/2020' '27/01/2020' '03/06/2020'
'28/01/2020' '29/01/2020' '30/01/2020' '31/01/2020' '01/02/2020'
'02/02/2020' '03/02/2020' '05/02/2020' '06/02/2020' '07/02/2020'
'08/02/2020' '09/02/2020' '10/02/2020' '11/02/2020' '12/02/2020'
'13/02/2020' '14/02/2020' '15/02/2020' '16/02/2020' '17/02/2020'
'18/02/2020' '19/02/2020' '20/02/2020' '21/02/2020' '22/02/2020'
'23/02/2020' '24/02/2020' '25/02/2020' '26/02/2020' '27/02/2020'
'28/02/2020' '29/02/2020' '01/03/2020' '02/03/2020' '03/03/2020'
'04/03/2020' '05/03/2020' '07/04/2020' '06/03/2020' '07/03/2020'
'08/03/2020' '09/03/2020' '10/03/2020' '11/03/2020' '12/03/2020'
'13/03/2020' '14/03/2020' '15/03/2020' '16/03/2020' '17/03/2020'
'18/03/2020' '19/03/2020' '20/03/2020' '21/03/2020' '22/03/2020'
'23/03/2020' '24/03/2020' '25/03/2020' '26/03/2020' '27/03/2020'
'28/03/2020' '29/03/2020' '30/03/2020' '31/03/2020' '01/04/2020'
'02/04/2020' '03/04/2020' '04/04/2020' '05/04/2020' '06/04/2020'
'08/04/2020' '09/04/2020' '10/04/2020' '11/04/2020' '12/04/2020'
'13/04/2020' '14/04/2020' '15/04/2020' '16/04/2020' '17/04/2020'
'18/04/2020' '19/04/2020' '20/04/2020' '21/04/2020' '22/04/2020'
'23/04/2020' '24/04/2020' '25/04/2020' '26/04/2020' '28/04/2020'
'29/04/2020' '30/04/2020' '01/05/2020' '02/05/2020' '03/05/2020'
'04/05/2020' '05/05/2020' '06/05/2020' '07/05/2020' '08/05/2020'
'09/05/2020' '10/05/2020' '11/05/2020' '12/05/2020' '13/05/2020'
'14/05/2020' '15/05/2020' '16/05/2020' '17/05/2020' '18/05/2020'
'19/05/2020' '20/05/2020' '21/05/2020' '22/05/2020' '23/05/2020'
'24/05/2020' '25/05/2020' '26/05/2020' '27/05/2020' '28/05/2020'
'29/05/2020' '30/05/2020' '31/05/2020' '01/06/2020' '02/06/2020'
'04/06/2020' '05/06/2020' '06/06/2020' '07/06/2020' '08/06/2020'
'09/06/2020' '10/06/2020' '11/06/2020' '12/06/2020' '13/06/2020'
'14/06/2020' '15/06/2020' '16/06/2020' '17/06/2020' '18/06/2020'
'19/06/2020' '20/06/2020' '21/06/2020' '22/06/2020' '23/06/2020'
'11/07/2020' '24/06/2020' '25/06/2020' '26/06/2020' '27/06/2020'
'28/06/2020' '29/06/2020' '30/06/2020' '01/07/2020' '02/07/2020'
'03/07/2020' '04/07/2020' '05/07/2020' '06/07/2020' '07/07/2020'
'08/07/2020' '09/07/2020' '10/07/2020' '12/07/2020' '13/07/2020'
'14/07/2020' '15/07/2020' '16/07/2020' '17/07/2020' '18/07/2020'
'19/07/2020' '20/07/2020' '21/07/2020' '22/07/2020' '23/07/2020'
'24/07/2020' '25/07/2020' '27/07/2020' '26/07/2020' '28/07/2020'
'29/07/2020' '30/07/2020' '31/07/2020' '01/08/2020' '02/08/2020'
'03/08/2020' '26/09/2020' '04/08/2020' '05/08/2020' '06/08/2020'
'07/08/2020' '08/08/2020' '09/08/2020' '10/08/2020' '11/08/2020'
'12/08/2020' '13/08/2020' '14/08/2020' '15/08/2020' '16/08/2020'
'17/08/2020' '18/08/2020' '19/08/2020' '20/08/2020' '21/08/2020'
'22/08/2020' '23/08/2020' '24/08/2020' '25/08/2020' '26/08/2020'
'27/08/2020' '28/08/2020' '29/08/2020' '30/08/2020' '31/08/2020'
'01/09/2020' '02/09/2020' '03/09/2020' '04/09/2020' '05/09/2020'
'06/09/2020' '07/09/2020' '08/09/2020' '09/09/2020' '10/09/2020'
'11/09/2020' '12/09/2020' '13/09/2020' '14/09/2020' '15/09/2020'
'16/09/2020' '17/09/2020' '18/09/2020' '19/09/2020' '20/09/2020'
'21/09/2020' '22/09/2020' '23/09/2020' '24/09/2020' '25/09/2020'
'27/09/2020' '28/09/2020' '29/09/2020' '30/09/2020' '01/10/2020'
'02/10/2020' '03/10/2020' '04/10/2020' '05/10/2020' '06/10/2020'
'07/10/2020' '08/10/2020' '09/10/2020' '10/10/2020' '11/10/2020'
'12/10/2020' '13/10/2020' '14/10/2020' '15/10/2020' '16/10/2020'
'17/10/2020' '18/10/2020' '19/10/2020' '20/10/2020' '21/10/2020'
'22/10/2020' '23/10/2020' '24/10/2020' '25/10/2020' '26/10/2020'
'27/10/2020' '28/10/2020' '29/10/2020' '30/10/2020' '31/10/2020'
'01/11/2020' '02/11/2020' '03/11/2020' '04/11/2020' '05/11/2020'
'06/11/2020' '07/11/2020' '08/11/2020' '09/11/2020' '10/11/2020'
'11/11/2020' '12/11/2020' '13/11/2020' '14/11/2020' '15/11/2020'
'16/11/2020' '17/11/2020' '18/11/2020' '19/11/2020' '20/11/2020'
'21/11/2020' '22/11/2020' '23/11/2020' '24/11/2020' '25/11/2020'
'26/11/2020' '27/11/2020' '28/11/2020' '29/11/2020' '30/11/2020'
'01/12/2020' '02/12/2020' '03/12/2020' '04/12/2020' '05/12/2020'
'06/12/2020' '07/12/2020' '08/12/2020' '09/12/2020' '10/12/2020'
'11/12/2020' '12/12/2020' '13/12/2020' '14/12/2020' '15/12/2020'
'16/12/2020' '17/12/2020' '18/12/2020' '19/12/2020' '20/12/2020'
'21/12/2020' '22/12/2020' '23/12/2020' '24/12/2020' '25/12/2020'
'26/12/2020' '27/12/2020' '28/12/2020' '29/12/2020' '30/12/2020'
'31/12/2020']
/n
------------------------------------------------
/n
Number of values in " time " {1438}
['09:00' '13:55' '01:25' ... '04:51' '05:11' '04:42']
/n
------------------------------------------------
/n
Number of values in " local_authority_ons_district " {378}
['E09000032' 'E09000022' 'E09000033' 'E09000025' 'E09000023' 'E09000011'
'E09000030' 'E09000014' 'E09000010' 'E09000006' 'E09000016' 'E09000029'
'E09000019' 'E09000005' 'E09000008' 'E09000020' 'E09000003' 'E09000026'
'E09000002' 'E09000021' 'E09000012' 'E09000028' 'E09000024' 'E09000017'
'E09000018' 'E09000013' 'E09000009' 'E09000031' 'E09000004' 'E09000027'
'E09000007' 'E09000015' 'EHEATHROW' 'E07000029' 'E07000030' 'E07000028'
'E07000031' 'E07000026' 'E07000027' 'E06000009' 'E07000125' 'E07000128'
'E07000123' 'E07000122' 'E07000119' 'E07000121' 'E06000008' 'E07000120'
'E07000117' 'E07000127' 'E07000126' 'E07000118' 'E07000124' 'E08000013'
'E08000015' 'E08000011' 'E08000012' 'E08000014' 'E08000007' 'E08000010'
'E08000001' 'E08000005' 'E08000003' 'E08000006' 'E08000008' 'E08000009'
'E08000002' 'E08000004' 'E06000007' 'E06000006' 'E06000050' 'E06000049'
'E08000021' 'E08000024' 'E06000048' 'E08000022' 'E08000020' 'E08000023'
'E06000047' 'E06000005' 'E07000163' 'E07000168' 'E07000167' 'E06000014'
'E07000165' 'E07000169' 'E07000164' 'E07000166' 'E08000035' 'E08000034'
'E08000036' 'E08000032' 'E08000033' 'E08000017' 'E08000016' 'E08000019'
'E08000018' 'E06000010' 'E06000013' 'E06000012' 'E06000011' 'E06000001'
'E06000003' 'E06000002' 'E06000004' 'E08000029' 'E08000028' 'E08000025'
'E08000031' 'E08000030' 'E08000027' 'E08000026' 'E07000195' 'E07000194'
'E07000197' 'E07000193' 'E06000021' 'E07000198' 'E07000192' 'E07000196'
'E07000199' 'E07000234' 'E07000236' 'E06000051' 'E07000239' 'E07000235'
'E06000020' 'E06000019' 'E07000237' 'E07000238' 'E07000221' 'E07000219'
'E07000218' 'E07000222' 'E07000220' 'E07000033' 'E07000032' 'E07000037'
'E07000036' 'E07000039' 'E07000035' 'E06000015' 'E07000038' 'E07000034'
'E07000176' 'E07000170' 'E07000174' 'E07000173' 'E07000175' 'E07000171'
'E06000018' 'E07000172' 'E07000140' 'E07000136' 'E07000138' 'E07000137'
'E07000141' 'E07000142' 'E07000139' 'E07000130' 'E07000131' 'E06000016'
'E07000135' 'E06000017' 'E07000129' 'E07000134' 'E07000133' 'E07000132'
'E07000155' 'E07000156' 'E07000151' 'E07000150' 'E07000152' 'E07000153'
'E07000154' 'E07000008' 'E07000010' 'E07000012' 'E06000031' 'E07000011'
'E07000009' 'E07000147' 'E07000146' 'E07000149' 'E07000143' 'E07000144'
'E07000148' 'E07000145' 'E07000200' 'E07000244' 'E07000203' 'E07000245'
'E07000202' 'E06000032' 'E06000056' 'E06000055' 'E07000095' 'E07000243'
'E07000242' 'E07000240' 'E07000096' 'E07000241' 'E07000099' 'E07000098'
'E07000103' 'E07000102' 'E07000072' 'E07000067' 'E07000066' 'E06000034'
'E07000070' 'E07000068' 'E07000077' 'E06000033' 'E07000071' 'E07000076'
'E07000069' 'E07000075' 'E07000074' 'E07000073' 'E07000179' 'E07000007'
'E07000177' 'E07000005' 'E07000006' 'E06000039' 'E06000042' 'E07000180'
'E06000038' 'E06000041' 'E06000036' 'E06000040' 'E07000178' 'E06000037'
'E07000004' 'E07000181' 'E07000086' 'E07000092' 'E06000044' 'E07000085'
'E07000090' 'E06000045' 'E07000087' 'E07000094' 'E07000088' 'E07000084'
'E06000046' 'E07000093' 'E07000091' 'E07000089' 'E07000215' 'E07000210'
'E07000208' 'E07000211' 'E07000212' 'E07000216' 'E07000213' 'E07000207'
'E07000217' 'E07000209' 'E07000214' 'E06000035' 'E07000106' 'E07000107'
'E07000105' 'E07000109' 'E07000116' 'E07000115' 'E07000110' 'E07000113'
'E07000111' 'E07000114' 'E07000112' 'E07000108' 'E07000228' 'E07000224'
'E07000064' 'E07000061' 'E07000065' 'E07000225' 'E06000043' 'E07000229'
'E07000062' 'E07000226' 'E07000063' 'E07000227' 'E07000223' 'E09000001'
'E06000052' 'E07000044' 'E07000041' 'E07000045' 'E07000047' 'E06000026'
'E07000040' 'E06000027' 'E07000046' 'E07000043' 'E07000042' 'E07000188'
'E07000187' 'E07000190' 'E07000189' 'E07000191' 'E06000025' 'E06000023'
'E06000022' 'E06000024' 'E07000083' 'E07000078' 'E07000082' 'E07000080'
'E07000081' 'E07000079' 'E06000030' 'E06000054' 'E06000029' 'E06000028'
'E07000048' 'E07000051' 'E07000053' 'E07000049' 'E07000052' 'E07000050'
'W06000002' 'W06000001' 'W06000004' 'W06000006' 'W06000003' 'W06000005'
'W06000022' 'W06000019' 'W06000018' 'W06000021' 'W06000020' 'W06000016'
'W06000024' 'W06000015' 'W06000012' 'W06000011' 'W06000014' 'W06000013'
'W06000023' 'W06000010' 'W06000008' 'W06000009' 'S12000019' 'S12000036'
'S12000043' 'S12000017' 'S12000035' 'S12000044' 'S12000029' 'S12000040'
'S12000021' 'S12000006' 'S12000041' 'S12000030' 'S12000014' 'S12000008'
'S12000038' 'S12000015' 'S12000034' 'S12000024' 'S12000042' 'S12000039'
'S12000010' 'S12000026' 'S12000028' 'S12000005' 'S12000033' 'S12000011'
'S12000009' 'S12000018' 'S12000020' 'S12000023' 'S12000013' 'S12000027']
/n
------------------------------------------------
/n
Number of values in " local_authority_highway " {206}
['E09000032' 'E09000022' 'E09000033' 'E09000025' 'E09000023' 'E09000011'
'E09000030' 'E09000014' 'E09000010' 'E09000006' 'E09000016' 'E09000029'
'E09000019' 'E09000005' 'E09000008' 'E09000020' 'E09000003' 'E09000026'
'E09000002' 'E09000021' 'E09000012' 'E09000028' 'E09000024' 'E09000017'
'E09000018' 'E09000013' 'E09000009' 'E09000031' 'E09000004' 'E09000027'
'E09000007' 'E09000015' 'EHEATHROW' 'E10000006' 'E06000009' 'E10000017'
'E06000008' 'E08000013' 'E08000015' 'E08000011' 'E08000012' 'E08000014'
'E08000007' 'E08000010' 'E08000001' 'E08000005' 'E08000003' 'E08000006'
'E08000008' 'E08000009' 'E08000002' 'E08000004' 'E06000007' 'E06000006'
'E06000050' 'E06000049' 'E08000021' 'E08000024' 'E06000048' 'E08000022'
'E08000020' 'E08000023' 'E06000047' 'E06000005' 'E10000023' 'E06000014'
'E08000035' 'E08000034' 'E08000036' 'E08000032' 'E08000033' 'E08000017'
'E08000016' 'E08000019' 'E08000018' 'E06000010' 'E06000013' 'E06000012'
'E06000011' 'E06000001' 'E06000003' 'E06000002' 'E06000004' 'E08000029'
'E08000028' 'E08000025' 'E08000031' 'E08000030' 'E08000027' 'E08000026'
'E10000028' 'E06000021' 'E10000034' 'E06000051' 'E06000020' 'E06000019'
'E10000031' 'E10000007' 'E06000015' 'E10000024' 'E06000018' 'E10000019'
'E10000018' 'E06000016' 'E06000017' 'E10000021' 'E10000003' 'E06000031'
'E10000020' 'E10000029' 'E06000032' 'E06000056' 'E06000055' 'E10000015'
'E10000012' 'E06000034' 'E06000033' 'E10000025' 'E10000002' 'E06000039'
'E06000042' 'E06000038' 'E06000041' 'E06000036' 'E06000040' 'E06000037'
'E10000014' 'E06000044' 'E06000045' 'E06000046' 'E10000030' 'E06000035'
'E10000016' 'E10000032' 'E10000011' 'E06000043' 'E09000001' 'E06000052'
'E10000008' 'E06000026' 'E06000027' 'E10000027' 'E06000025' 'E06000023'
'E06000022' 'E06000024' 'E10000013' 'E06000030' 'E06000054' 'E06000029'
'E06000028' 'E10000009' 'W06000002' 'W06000001' 'W06000004' 'W06000006'
'W06000003' 'W06000005' 'W06000022' 'W06000019' 'W06000018' 'W06000021'
'W06000020' 'W06000016' 'W06000024' 'W06000015' 'W06000012' 'W06000011'
'W06000014' 'W06000013' 'W06000023' 'W06000010' 'W06000008' 'W06000009'
'S12000019' 'S12000036' 'S12000043' 'S12000017' 'S12000035' 'S12000044'
'S12000029' 'S12000040' 'S12000021' 'S12000006' 'S12000041' 'S12000030'
'S12000014' 'S12000008' 'S12000038' 'S12000015' 'S12000034' 'S12000024'
'S12000042' 'S12000039' 'S12000010' 'S12000026' 'S12000028' 'S12000005'
'S12000033' 'S12000011' 'S12000009' 'S12000018' 'S12000020' 'S12000023'
'S12000013' 'S12000027']
/n
------------------------------------------------
/n
Number of values in " lsoa_of_accident_location " {25931}
['E01004576' 'E01003034' 'E01004726' ... 'W01000465' 'W01000466'
'W01000481']
/n
------------------------------------------------
What we observe here is mostly reference information for the accident. And what local authority the accident took place in. These can be excluded as we have geographical data which can also be used to describe this.
Whats interesting is the date. Lets observe what the distribution of these accidents is.
In order to do this we first need to convert the date object into a datetime format.
#Making a seperate DataFrame for date and total count of accidents.
timefreq = pd.DataFrame(characteristics['date'].value_counts())
timefreq= timefreq.reset_index()
timefreq = timefreq.rename(columns={'index': 'Date', 'date':'Total'})
timefreq
| Date | Total | |
|---|---|---|
| 0 | 06/02/2020 | 426 |
| 1 | 04/11/2020 | 414 |
| 2 | 06/03/2020 | 411 |
| 3 | 21/01/2020 | 399 |
| 4 | 10/01/2020 | 397 |
| 5 | 18/09/2020 | 397 |
| 6 | 17/01/2020 | 396 |
| 7 | 14/01/2020 | 395 |
| 8 | 07/02/2020 | 391 |
| 9 | 20/01/2020 | 386 |
| 10 | 04/12/2020 | 385 |
| 11 | 03/12/2020 | 382 |
| 12 | 09/10/2020 | 381 |
| 13 | 24/01/2020 | 381 |
| 14 | 31/07/2020 | 377 |
| 15 | 02/10/2020 | 375 |
| 16 | 15/01/2020 | 374 |
| 17 | 11/12/2020 | 372 |
| 18 | 13/01/2020 | 370 |
| 19 | 25/09/2020 | 369 |
| 20 | 15/09/2020 | 367 |
| 21 | 15/12/2020 | 362 |
| 22 | 31/01/2020 | 361 |
| 23 | 25/06/2020 | 360 |
| 24 | 28/08/2020 | 359 |
| 25 | 18/01/2020 | 357 |
| 26 | 28/01/2020 | 356 |
| 27 | 03/11/2020 | 352 |
| 28 | 12/02/2020 | 352 |
| 29 | 20/08/2020 | 351 |
| 30 | 05/03/2020 | 347 |
| 31 | 26/02/2020 | 347 |
| 32 | 17/09/2020 | 344 |
| 33 | 14/09/2020 | 341 |
| 34 | 09/01/2020 | 340 |
| 35 | 10/12/2020 | 339 |
| 36 | 30/01/2020 | 338 |
| 37 | 24/09/2020 | 338 |
| 38 | 05/02/2020 | 337 |
| 39 | 29/01/2020 | 337 |
| 40 | 30/09/2020 | 336 |
| 41 | 23/10/2020 | 336 |
| 42 | 15/10/2020 | 336 |
| 43 | 02/12/2020 | 336 |
| 44 | 01/12/2020 | 335 |
| 45 | 02/03/2020 | 334 |
| 46 | 17/07/2020 | 334 |
| 47 | 27/01/2020 | 333 |
| 48 | 23/01/2020 | 332 |
| 49 | 07/08/2020 | 331 |
| 50 | 17/12/2020 | 330 |
| 51 | 30/07/2020 | 330 |
| 52 | 10/09/2020 | 329 |
| 53 | 13/02/2020 | 328 |
| 54 | 28/02/2020 | 328 |
| 55 | 27/02/2020 | 327 |
| 56 | 16/10/2020 | 327 |
| 57 | 10/07/2020 | 327 |
| 58 | 21/09/2020 | 327 |
| 59 | 01/10/2020 | 326 |
| 60 | 25/02/2020 | 326 |
| 61 | 08/08/2020 | 325 |
| 62 | 03/03/2020 | 324 |
| 63 | 09/09/2020 | 320 |
| 64 | 14/02/2020 | 318 |
| 65 | 16/01/2020 | 317 |
| 66 | 22/09/2020 | 316 |
| 67 | 04/02/2020 | 316 |
| 68 | 19/09/2020 | 316 |
| 69 | 09/03/2020 | 315 |
| 70 | 11/07/2020 | 314 |
| 71 | 12/09/2020 | 313 |
| 72 | 09/12/2020 | 313 |
| 73 | 29/09/2020 | 312 |
| 74 | 08/12/2020 | 312 |
| 75 | 11/08/2020 | 311 |
| 76 | 04/09/2020 | 311 |
| 77 | 12/07/2020 | 310 |
| 78 | 02/09/2020 | 310 |
| 79 | 04/03/2020 | 310 |
| 80 | 12/08/2020 | 310 |
| 81 | 11/09/2020 | 309 |
| 82 | 05/08/2020 | 309 |
| 83 | 24/06/2020 | 308 |
| 84 | 16/09/2020 | 307 |
| 85 | 13/11/2020 | 306 |
| 86 | 26/06/2020 | 305 |
| 87 | 18/02/2020 | 305 |
| 88 | 21/02/2020 | 304 |
| 89 | 29/02/2020 | 303 |
| 90 | 18/12/2020 | 303 |
| 91 | 07/01/2020 | 303 |
| 92 | 03/08/2020 | 303 |
| 93 | 10/08/2020 | 302 |
| 94 | 07/10/2020 | 301 |
| 95 | 24/07/2020 | 301 |
| 96 | 08/01/2020 | 301 |
| 97 | 20/11/2020 | 301 |
| 98 | 05/10/2020 | 301 |
| 99 | 02/11/2020 | 300 |
| 100 | 11/03/2020 | 300 |
| 101 | 22/10/2020 | 299 |
| 102 | 19/08/2020 | 298 |
| 103 | 27/08/2020 | 298 |
| 104 | 23/09/2020 | 298 |
| 105 | 11/02/2020 | 298 |
| 106 | 12/03/2020 | 297 |
| 107 | 28/09/2020 | 297 |
| 108 | 14/12/2020 | 297 |
| 109 | 29/08/2020 | 297 |
| 110 | 06/10/2020 | 297 |
| 111 | 01/08/2020 | 296 |
| 112 | 14/08/2020 | 296 |
| 113 | 05/09/2020 | 296 |
| 114 | 27/11/2020 | 296 |
| 115 | 22/01/2020 | 295 |
| 116 | 06/08/2020 | 295 |
| 117 | 30/10/2020 | 295 |
| 118 | 03/10/2020 | 294 |
| 119 | 31/10/2020 | 292 |
| 120 | 01/09/2020 | 292 |
| 121 | 20/10/2020 | 292 |
| 122 | 07/12/2020 | 292 |
| 123 | 08/02/2020 | 292 |
| 124 | 16/12/2020 | 291 |
| 125 | 08/10/2020 | 291 |
| 126 | 10/02/2020 | 290 |
| 127 | 03/02/2020 | 290 |
| 128 | 22/07/2020 | 290 |
| 129 | 05/12/2020 | 289 |
| 130 | 01/02/2020 | 289 |
| 131 | 17/02/2020 | 287 |
| 132 | 13/10/2020 | 287 |
| 133 | 21/08/2020 | 284 |
| 134 | 18/11/2020 | 284 |
| 135 | 18/07/2020 | 283 |
| 136 | 19/01/2020 | 283 |
| 137 | 14/10/2020 | 282 |
| 138 | 23/06/2020 | 281 |
| 139 | 06/11/2020 | 281 |
| 140 | 13/08/2020 | 280 |
| 141 | 24/08/2020 | 280 |
| 142 | 12/10/2020 | 280 |
| 143 | 08/09/2020 | 279 |
| 144 | 15/08/2020 | 279 |
| 145 | 03/09/2020 | 279 |
| 146 | 20/09/2020 | 278 |
| 147 | 19/11/2020 | 277 |
| 148 | 10/10/2020 | 277 |
| 149 | 22/08/2020 | 277 |
| 150 | 13/03/2020 | 277 |
| 151 | 30/11/2020 | 276 |
| 152 | 20/06/2020 | 275 |
| 153 | 09/08/2020 | 275 |
| 154 | 23/11/2020 | 275 |
| 155 | 24/11/2020 | 274 |
| 156 | 13/09/2020 | 273 |
| 157 | 30/08/2020 | 272 |
| 158 | 20/02/2020 | 272 |
| 159 | 24/02/2020 | 272 |
| 160 | 26/10/2020 | 272 |
| 161 | 16/11/2020 | 272 |
| 162 | 18/08/2020 | 271 |
| 163 | 25/07/2020 | 271 |
| 164 | 29/05/2020 | 268 |
| 165 | 23/12/2020 | 268 |
| 166 | 19/12/2020 | 268 |
| 167 | 25/01/2020 | 267 |
| 168 | 28/10/2020 | 267 |
| 169 | 26/09/2020 | 267 |
| 170 | 30/05/2020 | 266 |
| 171 | 21/10/2020 | 266 |
| 172 | 24/12/2020 | 266 |
| 173 | 06/07/2020 | 265 |
| 174 | 12/11/2020 | 265 |
| 175 | 24/10/2020 | 265 |
| 176 | 10/03/2020 | 265 |
| 177 | 21/07/2020 | 265 |
| 178 | 26/11/2020 | 265 |
| 179 | 19/06/2020 | 264 |
| 180 | 20/07/2020 | 263 |
| 181 | 04/08/2020 | 260 |
| 182 | 25/10/2020 | 256 |
| 183 | 02/08/2020 | 256 |
| 184 | 26/08/2020 | 256 |
| 185 | 23/07/2020 | 255 |
| 186 | 06/01/2020 | 255 |
| 187 | 07/03/2020 | 254 |
| 188 | 17/08/2020 | 254 |
| 189 | 06/09/2020 | 254 |
| 190 | 28/07/2020 | 254 |
| 191 | 21/12/2020 | 253 |
| 192 | 08/07/2020 | 253 |
| 193 | 16/08/2020 | 253 |
| 194 | 09/11/2020 | 253 |
| 195 | 07/09/2020 | 253 |
| 196 | 04/07/2020 | 252 |
| 197 | 20/05/2020 | 252 |
| 198 | 22/06/2020 | 251 |
| 199 | 29/07/2020 | 251 |
| 200 | 10/11/2020 | 251 |
| 201 | 25/08/2020 | 249 |
| 202 | 17/10/2020 | 249 |
| 203 | 03/07/2020 | 248 |
| 204 | 12/12/2020 | 248 |
| 205 | 29/10/2020 | 247 |
| 206 | 27/10/2020 | 246 |
| 207 | 15/02/2020 | 246 |
| 208 | 05/11/2020 | 246 |
| 209 | 04/10/2020 | 246 |
| 210 | 16/03/2020 | 244 |
| 211 | 19/10/2020 | 243 |
| 212 | 07/11/2020 | 242 |
| 213 | 19/02/2020 | 242 |
| 214 | 31/08/2020 | 241 |
| 215 | 18/06/2020 | 241 |
| 216 | 01/06/2020 | 240 |
| 217 | 22/12/2020 | 240 |
| 218 | 31/05/2020 | 239 |
| 219 | 11/10/2020 | 239 |
| 220 | 28/05/2020 | 239 |
| 221 | 16/07/2020 | 238 |
| 222 | 19/07/2020 | 238 |
| 223 | 08/03/2020 | 238 |
| 224 | 02/06/2020 | 238 |
| 225 | 03/01/2020 | 238 |
| 226 | 01/03/2020 | 237 |
| 227 | 11/01/2020 | 237 |
| 228 | 04/01/2020 | 236 |
| 229 | 25/11/2020 | 236 |
| 230 | 27/07/2020 | 235 |
| 231 | 13/06/2020 | 235 |
| 232 | 11/11/2020 | 234 |
| 233 | 05/06/2020 | 234 |
| 234 | 13/07/2020 | 234 |
| 235 | 14/03/2020 | 234 |
| 236 | 26/01/2020 | 233 |
| 237 | 27/05/2020 | 231 |
| 238 | 27/09/2020 | 230 |
| 239 | 16/06/2020 | 230 |
| 240 | 02/02/2020 | 229 |
| 241 | 17/06/2020 | 229 |
| 242 | 02/07/2020 | 228 |
| 243 | 15/07/2020 | 227 |
| 244 | 16/02/2020 | 225 |
| 245 | 07/07/2020 | 223 |
| 246 | 09/07/2020 | 223 |
| 247 | 02/01/2020 | 222 |
| 248 | 26/07/2020 | 222 |
| 249 | 14/07/2020 | 221 |
| 250 | 01/11/2020 | 221 |
| 251 | 17/11/2020 | 219 |
| 252 | 30/06/2020 | 218 |
| 253 | 22/02/2020 | 217 |
| 254 | 01/07/2020 | 216 |
| 255 | 06/12/2020 | 215 |
| 256 | 15/06/2020 | 215 |
| 257 | 22/05/2020 | 214 |
| 258 | 27/06/2020 | 214 |
| 259 | 14/11/2020 | 213 |
| 260 | 05/07/2020 | 209 |
| 261 | 25/05/2020 | 208 |
| 262 | 23/08/2020 | 207 |
| 263 | 20/03/2020 | 206 |
| 264 | 26/05/2020 | 205 |
| 265 | 17/03/2020 | 205 |
| 266 | 12/01/2020 | 202 |
| 267 | 23/02/2020 | 201 |
| 268 | 21/05/2020 | 200 |
| 269 | 18/10/2020 | 200 |
| 270 | 13/12/2020 | 199 |
| 271 | 29/06/2020 | 198 |
| 272 | 28/11/2020 | 198 |
| 273 | 12/06/2020 | 198 |
| 274 | 15/05/2020 | 197 |
| 275 | 01/01/2020 | 196 |
| 276 | 04/06/2020 | 196 |
| 277 | 11/06/2020 | 196 |
| 278 | 19/03/2020 | 196 |
| 279 | 31/12/2020 | 196 |
| 280 | 18/03/2020 | 195 |
| 281 | 19/05/2020 | 194 |
| 282 | 21/11/2020 | 192 |
| 283 | 28/06/2020 | 191 |
| 284 | 10/06/2020 | 190 |
| 285 | 24/05/2020 | 188 |
| 286 | 09/02/2020 | 188 |
| 287 | 15/03/2020 | 185 |
| 288 | 06/06/2020 | 183 |
| 289 | 22/11/2020 | 181 |
| 290 | 18/05/2020 | 179 |
| 291 | 23/05/2020 | 179 |
| 292 | 14/06/2020 | 178 |
| 293 | 21/06/2020 | 177 |
| 294 | 20/12/2020 | 177 |
| 295 | 14/05/2020 | 177 |
| 296 | 07/05/2020 | 176 |
| 297 | 21/03/2020 | 175 |
| 298 | 03/06/2020 | 175 |
| 299 | 06/05/2020 | 171 |
| 300 | 09/06/2020 | 171 |
| 301 | 23/03/2020 | 171 |
| 302 | 07/06/2020 | 169 |
| 303 | 08/11/2020 | 168 |
| 304 | 24/04/2020 | 166 |
| 305 | 15/11/2020 | 164 |
| 306 | 27/12/2020 | 163 |
| 307 | 12/05/2020 | 162 |
| 308 | 08/06/2020 | 161 |
| 309 | 16/05/2020 | 161 |
| 310 | 22/03/2020 | 161 |
| 311 | 28/12/2020 | 158 |
| 312 | 17/05/2020 | 157 |
| 313 | 30/12/2020 | 157 |
| 314 | 09/05/2020 | 156 |
| 315 | 01/05/2020 | 156 |
| 316 | 29/11/2020 | 154 |
| 317 | 05/01/2020 | 154 |
| 318 | 08/05/2020 | 151 |
| 319 | 13/05/2020 | 150 |
| 320 | 22/04/2020 | 148 |
| 321 | 05/05/2020 | 147 |
| 322 | 02/05/2020 | 146 |
| 323 | 25/04/2020 | 144 |
| 324 | 29/12/2020 | 140 |
| 325 | 28/04/2020 | 135 |
| 326 | 14/04/2020 | 134 |
| 327 | 30/04/2020 | 133 |
| 328 | 26/04/2020 | 133 |
| 329 | 23/04/2020 | 132 |
| 330 | 25/12/2020 | 125 |
| 331 | 21/04/2020 | 125 |
| 332 | 04/05/2020 | 125 |
| 333 | 11/05/2020 | 123 |
| 334 | 15/04/2020 | 121 |
| 335 | 26/12/2020 | 117 |
| 336 | 11/04/2020 | 115 |
| 337 | 27/03/2020 | 113 |
| 338 | 16/04/2020 | 113 |
| 339 | 09/04/2020 | 113 |
| 340 | 27/04/2020 | 110 |
| 341 | 07/04/2020 | 110 |
| 342 | 10/04/2020 | 108 |
| 343 | 20/04/2020 | 108 |
| 344 | 25/03/2020 | 104 |
| 345 | 29/04/2020 | 103 |
| 346 | 17/04/2020 | 100 |
| 347 | 31/03/2020 | 99 |
| 348 | 10/05/2020 | 99 |
| 349 | 24/03/2020 | 99 |
| 350 | 19/04/2020 | 98 |
| 351 | 06/04/2020 | 98 |
| 352 | 26/03/2020 | 96 |
| 353 | 02/04/2020 | 89 |
| 354 | 08/04/2020 | 88 |
| 355 | 18/04/2020 | 86 |
| 356 | 04/04/2020 | 84 |
| 357 | 13/04/2020 | 84 |
| 358 | 12/04/2020 | 83 |
| 359 | 03/05/2020 | 83 |
| 360 | 03/04/2020 | 81 |
| 361 | 05/04/2020 | 78 |
| 362 | 01/04/2020 | 78 |
| 363 | 30/03/2020 | 58 |
| 364 | 28/03/2020 | 55 |
| 365 | 29/03/2020 | 46 |
As we can see we have the total count of accident each day. But they are not ordered yet and still in as an object. Lets convert and sort them.
timefreq['Date'] = pd.to_datetime(timefreq['Date'], format='%d/%m/%Y')
timefreq = timefreq.sort_values(by='Date')
timefreq
| Date | Total | |
|---|---|---|
| 275 | 2020-01-01 | 196 |
| 247 | 2020-01-02 | 222 |
| 225 | 2020-01-03 | 238 |
| 228 | 2020-01-04 | 236 |
| 317 | 2020-01-05 | 154 |
| 186 | 2020-01-06 | 255 |
| 91 | 2020-01-07 | 303 |
| 96 | 2020-01-08 | 301 |
| 34 | 2020-01-09 | 340 |
| 4 | 2020-01-10 | 397 |
| 227 | 2020-01-11 | 237 |
| 266 | 2020-01-12 | 202 |
| 18 | 2020-01-13 | 370 |
| 7 | 2020-01-14 | 395 |
| 16 | 2020-01-15 | 374 |
| 65 | 2020-01-16 | 317 |
| 6 | 2020-01-17 | 396 |
| 25 | 2020-01-18 | 357 |
| 136 | 2020-01-19 | 283 |
| 9 | 2020-01-20 | 386 |
| 3 | 2020-01-21 | 399 |
| 115 | 2020-01-22 | 295 |
| 48 | 2020-01-23 | 332 |
| 13 | 2020-01-24 | 381 |
| 167 | 2020-01-25 | 267 |
| 236 | 2020-01-26 | 233 |
| 47 | 2020-01-27 | 333 |
| 26 | 2020-01-28 | 356 |
| 39 | 2020-01-29 | 337 |
| 36 | 2020-01-30 | 338 |
| 22 | 2020-01-31 | 361 |
| 130 | 2020-02-01 | 289 |
| 240 | 2020-02-02 | 229 |
| 127 | 2020-02-03 | 290 |
| 67 | 2020-02-04 | 316 |
| 38 | 2020-02-05 | 337 |
| 0 | 2020-02-06 | 426 |
| 8 | 2020-02-07 | 391 |
| 123 | 2020-02-08 | 292 |
| 286 | 2020-02-09 | 188 |
| 126 | 2020-02-10 | 290 |
| 105 | 2020-02-11 | 298 |
| 28 | 2020-02-12 | 352 |
| 53 | 2020-02-13 | 328 |
| 64 | 2020-02-14 | 318 |
| 207 | 2020-02-15 | 246 |
| 244 | 2020-02-16 | 225 |
| 131 | 2020-02-17 | 287 |
| 87 | 2020-02-18 | 305 |
| 213 | 2020-02-19 | 242 |
| 158 | 2020-02-20 | 272 |
| 88 | 2020-02-21 | 304 |
| 253 | 2020-02-22 | 217 |
| 267 | 2020-02-23 | 201 |
| 159 | 2020-02-24 | 272 |
| 60 | 2020-02-25 | 326 |
| 31 | 2020-02-26 | 347 |
| 55 | 2020-02-27 | 327 |
| 54 | 2020-02-28 | 328 |
| 89 | 2020-02-29 | 303 |
| 226 | 2020-03-01 | 237 |
| 45 | 2020-03-02 | 334 |
| 62 | 2020-03-03 | 324 |
| 79 | 2020-03-04 | 310 |
| 30 | 2020-03-05 | 347 |
| 2 | 2020-03-06 | 411 |
| 187 | 2020-03-07 | 254 |
| 223 | 2020-03-08 | 238 |
| 69 | 2020-03-09 | 315 |
| 176 | 2020-03-10 | 265 |
| 100 | 2020-03-11 | 300 |
| 106 | 2020-03-12 | 297 |
| 150 | 2020-03-13 | 277 |
| 235 | 2020-03-14 | 234 |
| 287 | 2020-03-15 | 185 |
| 210 | 2020-03-16 | 244 |
| 265 | 2020-03-17 | 205 |
| 280 | 2020-03-18 | 195 |
| 278 | 2020-03-19 | 196 |
| 263 | 2020-03-20 | 206 |
| 297 | 2020-03-21 | 175 |
| 310 | 2020-03-22 | 161 |
| 301 | 2020-03-23 | 171 |
| 349 | 2020-03-24 | 99 |
| 344 | 2020-03-25 | 104 |
| 352 | 2020-03-26 | 96 |
| 337 | 2020-03-27 | 113 |
| 364 | 2020-03-28 | 55 |
| 365 | 2020-03-29 | 46 |
| 363 | 2020-03-30 | 58 |
| 347 | 2020-03-31 | 99 |
| 362 | 2020-04-01 | 78 |
| 353 | 2020-04-02 | 89 |
| 360 | 2020-04-03 | 81 |
| 356 | 2020-04-04 | 84 |
| 361 | 2020-04-05 | 78 |
| 351 | 2020-04-06 | 98 |
| 341 | 2020-04-07 | 110 |
| 354 | 2020-04-08 | 88 |
| 339 | 2020-04-09 | 113 |
| 342 | 2020-04-10 | 108 |
| 336 | 2020-04-11 | 115 |
| 358 | 2020-04-12 | 83 |
| 357 | 2020-04-13 | 84 |
| 326 | 2020-04-14 | 134 |
| 334 | 2020-04-15 | 121 |
| 338 | 2020-04-16 | 113 |
| 346 | 2020-04-17 | 100 |
| 355 | 2020-04-18 | 86 |
| 350 | 2020-04-19 | 98 |
| 343 | 2020-04-20 | 108 |
| 331 | 2020-04-21 | 125 |
| 320 | 2020-04-22 | 148 |
| 329 | 2020-04-23 | 132 |
| 304 | 2020-04-24 | 166 |
| 323 | 2020-04-25 | 144 |
| 328 | 2020-04-26 | 133 |
| 340 | 2020-04-27 | 110 |
| 325 | 2020-04-28 | 135 |
| 345 | 2020-04-29 | 103 |
| 327 | 2020-04-30 | 133 |
| 315 | 2020-05-01 | 156 |
| 322 | 2020-05-02 | 146 |
| 359 | 2020-05-03 | 83 |
| 332 | 2020-05-04 | 125 |
| 321 | 2020-05-05 | 147 |
| 299 | 2020-05-06 | 171 |
| 296 | 2020-05-07 | 176 |
| 318 | 2020-05-08 | 151 |
| 314 | 2020-05-09 | 156 |
| 348 | 2020-05-10 | 99 |
| 333 | 2020-05-11 | 123 |
| 307 | 2020-05-12 | 162 |
| 319 | 2020-05-13 | 150 |
| 295 | 2020-05-14 | 177 |
| 274 | 2020-05-15 | 197 |
| 309 | 2020-05-16 | 161 |
| 312 | 2020-05-17 | 157 |
| 290 | 2020-05-18 | 179 |
| 281 | 2020-05-19 | 194 |
| 197 | 2020-05-20 | 252 |
| 268 | 2020-05-21 | 200 |
| 257 | 2020-05-22 | 214 |
| 291 | 2020-05-23 | 179 |
| 285 | 2020-05-24 | 188 |
| 261 | 2020-05-25 | 208 |
| 264 | 2020-05-26 | 205 |
| 237 | 2020-05-27 | 231 |
| 220 | 2020-05-28 | 239 |
| 164 | 2020-05-29 | 268 |
| 170 | 2020-05-30 | 266 |
| 218 | 2020-05-31 | 239 |
| 216 | 2020-06-01 | 240 |
| 224 | 2020-06-02 | 238 |
| 298 | 2020-06-03 | 175 |
| 276 | 2020-06-04 | 196 |
| 233 | 2020-06-05 | 234 |
| 288 | 2020-06-06 | 183 |
| 302 | 2020-06-07 | 169 |
| 308 | 2020-06-08 | 161 |
| 300 | 2020-06-09 | 171 |
| 284 | 2020-06-10 | 190 |
| 277 | 2020-06-11 | 196 |
| 273 | 2020-06-12 | 198 |
| 231 | 2020-06-13 | 235 |
| 292 | 2020-06-14 | 178 |
| 256 | 2020-06-15 | 215 |
| 239 | 2020-06-16 | 230 |
| 241 | 2020-06-17 | 229 |
| 215 | 2020-06-18 | 241 |
| 179 | 2020-06-19 | 264 |
| 152 | 2020-06-20 | 275 |
| 293 | 2020-06-21 | 177 |
| 198 | 2020-06-22 | 251 |
| 138 | 2020-06-23 | 281 |
| 83 | 2020-06-24 | 308 |
| 23 | 2020-06-25 | 360 |
| 86 | 2020-06-26 | 305 |
| 258 | 2020-06-27 | 214 |
| 283 | 2020-06-28 | 191 |
| 271 | 2020-06-29 | 198 |
| 252 | 2020-06-30 | 218 |
| 254 | 2020-07-01 | 216 |
| 242 | 2020-07-02 | 228 |
| 203 | 2020-07-03 | 248 |
| 196 | 2020-07-04 | 252 |
| 260 | 2020-07-05 | 209 |
| 173 | 2020-07-06 | 265 |
| 245 | 2020-07-07 | 223 |
| 192 | 2020-07-08 | 253 |
| 246 | 2020-07-09 | 223 |
| 57 | 2020-07-10 | 327 |
| 70 | 2020-07-11 | 314 |
| 77 | 2020-07-12 | 310 |
| 234 | 2020-07-13 | 234 |
| 249 | 2020-07-14 | 221 |
| 243 | 2020-07-15 | 227 |
| 221 | 2020-07-16 | 238 |
| 46 | 2020-07-17 | 334 |
| 135 | 2020-07-18 | 283 |
| 222 | 2020-07-19 | 238 |
| 180 | 2020-07-20 | 263 |
| 177 | 2020-07-21 | 265 |
| 128 | 2020-07-22 | 290 |
| 185 | 2020-07-23 | 255 |
| 95 | 2020-07-24 | 301 |
| 163 | 2020-07-25 | 271 |
| 248 | 2020-07-26 | 222 |
| 230 | 2020-07-27 | 235 |
| 190 | 2020-07-28 | 254 |
| 199 | 2020-07-29 | 251 |
| 51 | 2020-07-30 | 330 |
| 14 | 2020-07-31 | 377 |
| 111 | 2020-08-01 | 296 |
| 183 | 2020-08-02 | 256 |
| 92 | 2020-08-03 | 303 |
| 181 | 2020-08-04 | 260 |
| 82 | 2020-08-05 | 309 |
| 116 | 2020-08-06 | 295 |
| 49 | 2020-08-07 | 331 |
| 61 | 2020-08-08 | 325 |
| 153 | 2020-08-09 | 275 |
| 93 | 2020-08-10 | 302 |
| 75 | 2020-08-11 | 311 |
| 80 | 2020-08-12 | 310 |
| 140 | 2020-08-13 | 280 |
| 112 | 2020-08-14 | 296 |
| 144 | 2020-08-15 | 279 |
| 193 | 2020-08-16 | 253 |
| 188 | 2020-08-17 | 254 |
| 162 | 2020-08-18 | 271 |
| 102 | 2020-08-19 | 298 |
| 29 | 2020-08-20 | 351 |
| 133 | 2020-08-21 | 284 |
| 149 | 2020-08-22 | 277 |
| 262 | 2020-08-23 | 207 |
| 141 | 2020-08-24 | 280 |
| 201 | 2020-08-25 | 249 |
| 184 | 2020-08-26 | 256 |
| 103 | 2020-08-27 | 298 |
| 24 | 2020-08-28 | 359 |
| 109 | 2020-08-29 | 297 |
| 157 | 2020-08-30 | 272 |
| 214 | 2020-08-31 | 241 |
| 120 | 2020-09-01 | 292 |
| 78 | 2020-09-02 | 310 |
| 145 | 2020-09-03 | 279 |
| 76 | 2020-09-04 | 311 |
| 113 | 2020-09-05 | 296 |
| 189 | 2020-09-06 | 254 |
| 195 | 2020-09-07 | 253 |
| 143 | 2020-09-08 | 279 |
| 63 | 2020-09-09 | 320 |
| 52 | 2020-09-10 | 329 |
| 81 | 2020-09-11 | 309 |
| 71 | 2020-09-12 | 313 |
| 156 | 2020-09-13 | 273 |
| 33 | 2020-09-14 | 341 |
| 20 | 2020-09-15 | 367 |
| 84 | 2020-09-16 | 307 |
| 32 | 2020-09-17 | 344 |
| 5 | 2020-09-18 | 397 |
| 68 | 2020-09-19 | 316 |
| 146 | 2020-09-20 | 278 |
| 58 | 2020-09-21 | 327 |
| 66 | 2020-09-22 | 316 |
| 104 | 2020-09-23 | 298 |
| 37 | 2020-09-24 | 338 |
| 19 | 2020-09-25 | 369 |
| 169 | 2020-09-26 | 267 |
| 238 | 2020-09-27 | 230 |
| 107 | 2020-09-28 | 297 |
| 73 | 2020-09-29 | 312 |
| 40 | 2020-09-30 | 336 |
| 59 | 2020-10-01 | 326 |
| 15 | 2020-10-02 | 375 |
| 118 | 2020-10-03 | 294 |
| 209 | 2020-10-04 | 246 |
| 98 | 2020-10-05 | 301 |
| 110 | 2020-10-06 | 297 |
| 94 | 2020-10-07 | 301 |
| 125 | 2020-10-08 | 291 |
| 12 | 2020-10-09 | 381 |
| 148 | 2020-10-10 | 277 |
| 219 | 2020-10-11 | 239 |
| 142 | 2020-10-12 | 280 |
| 132 | 2020-10-13 | 287 |
| 137 | 2020-10-14 | 282 |
| 42 | 2020-10-15 | 336 |
| 56 | 2020-10-16 | 327 |
| 202 | 2020-10-17 | 249 |
| 269 | 2020-10-18 | 200 |
| 211 | 2020-10-19 | 243 |
| 121 | 2020-10-20 | 292 |
| 171 | 2020-10-21 | 266 |
| 101 | 2020-10-22 | 299 |
| 41 | 2020-10-23 | 336 |
| 175 | 2020-10-24 | 265 |
| 182 | 2020-10-25 | 256 |
| 160 | 2020-10-26 | 272 |
| 206 | 2020-10-27 | 246 |
| 168 | 2020-10-28 | 267 |
| 205 | 2020-10-29 | 247 |
| 117 | 2020-10-30 | 295 |
| 119 | 2020-10-31 | 292 |
| 250 | 2020-11-01 | 221 |
| 99 | 2020-11-02 | 300 |
| 27 | 2020-11-03 | 352 |
| 1 | 2020-11-04 | 414 |
| 208 | 2020-11-05 | 246 |
| 139 | 2020-11-06 | 281 |
| 212 | 2020-11-07 | 242 |
| 303 | 2020-11-08 | 168 |
| 194 | 2020-11-09 | 253 |
| 200 | 2020-11-10 | 251 |
| 232 | 2020-11-11 | 234 |
| 174 | 2020-11-12 | 265 |
| 85 | 2020-11-13 | 306 |
| 259 | 2020-11-14 | 213 |
| 305 | 2020-11-15 | 164 |
| 161 | 2020-11-16 | 272 |
| 251 | 2020-11-17 | 219 |
| 134 | 2020-11-18 | 284 |
| 147 | 2020-11-19 | 277 |
| 97 | 2020-11-20 | 301 |
| 282 | 2020-11-21 | 192 |
| 289 | 2020-11-22 | 181 |
| 154 | 2020-11-23 | 275 |
| 155 | 2020-11-24 | 274 |
| 229 | 2020-11-25 | 236 |
| 178 | 2020-11-26 | 265 |
| 114 | 2020-11-27 | 296 |
| 272 | 2020-11-28 | 198 |
| 316 | 2020-11-29 | 154 |
| 151 | 2020-11-30 | 276 |
| 44 | 2020-12-01 | 335 |
| 43 | 2020-12-02 | 336 |
| 11 | 2020-12-03 | 382 |
| 10 | 2020-12-04 | 385 |
| 129 | 2020-12-05 | 289 |
| 255 | 2020-12-06 | 215 |
| 122 | 2020-12-07 | 292 |
| 74 | 2020-12-08 | 312 |
| 72 | 2020-12-09 | 313 |
| 35 | 2020-12-10 | 339 |
| 17 | 2020-12-11 | 372 |
| 204 | 2020-12-12 | 248 |
| 270 | 2020-12-13 | 199 |
| 108 | 2020-12-14 | 297 |
| 21 | 2020-12-15 | 362 |
| 124 | 2020-12-16 | 291 |
| 50 | 2020-12-17 | 330 |
| 90 | 2020-12-18 | 303 |
| 166 | 2020-12-19 | 268 |
| 294 | 2020-12-20 | 177 |
| 191 | 2020-12-21 | 253 |
| 217 | 2020-12-22 | 240 |
| 165 | 2020-12-23 | 268 |
| 172 | 2020-12-24 | 266 |
| 330 | 2020-12-25 | 125 |
| 335 | 2020-12-26 | 117 |
| 306 | 2020-12-27 | 163 |
| 311 | 2020-12-28 | 158 |
| 324 | 2020-12-29 | 140 |
| 313 | 2020-12-30 | 157 |
| 279 | 2020-12-31 | 196 |
Now we can plot them in a graph.
fig = px.line(timefreq, x='Date', y='Total')
fig.show()
The remaining set of features is large. That is why first a look will be taken at the general distribution of each feature. After which a selection of features can be furtherly observed.
sns.set(font_scale = 1.5)
fig, ax = plt.subplots(15,2, figsize=(25, 110))
i=0
for col in characteristics.select_dtypes(include=['float64','int64']):
sns.distplot(characteristics[col],label=col,ax=ax[i//2][i%2])
i=i+1
fig.show()
accident_year: This feature can be disregarded for now as all events took place in 2020location_easting_osgr: We observe a spike between 50 degrees and 60 degreeslocation_northing_osgr: We observe another spike this could be correlated to the spike in location_easting_osgrLongitude: We observe a spike that correlates to location_easting_osgr, these are the same features with different formatting (preferred)Longitude: We observe a spike that correlates to location_northing_osgr, these are the same features with different formatting (preferred)police_force: We observe a spike for police force 1. This correlates to the Metropolitan police force which are located in the county of London. This explains the spike as London is a very highly concentrated area.accident_severity: We observe a steady increase with severity type 3 being on top. 1. Fatal, 2. Serious, 3.Slightnumber_of_vehicles: We observe most accidents involve 2 vehicles. with little to none involving more than 4.number_of_casualties: A massive spike with 1 casualty. This is because each datapoint contains atleast 1 casualty. Lesser so more than 1.day_of_week: We observe most accidents occur on saturday. This can be the day people go out the most in the free time.local_authority_district: We observe a spike in districts 0-20. These correlate to police_force as they are the districts within that certain force.first_road_class: A spike in 3 and 6. which correlates to roadclass A and Unknown. Road type A being main roads that connect regional towns and cities [2]first_road_number: The 'names' of each road classified by a number. This correlates to the location of the road.road_type: A spike in road type 6 being Single carriegeway These are regional roads with a max speed of 30mph(48km/h) unless stated otherwise by signs. [3] Whats odd is the relatively lower number of 3: Dual carriegeway which has the same speed limit. This could be because of more room and better visibility of the surrounding environment.speed_limit: Here we observe most accident occur at 30mph speed limit. This correlates to the high number of accidents on road_type 3junction_detail: here we observe high rates on 0-9. These are categorized. The small spike at 99 are cases of which the detials are unknown (self-reported cases)junction_control: This data describes the regulations for flow of traffic. with -1 being data that is out of range. and 4 give way or uncontrolled. With such a high number of unknown situations this data might not be useful.second_road_class: This feature is correlatable to road_type. further use will be made of the latter.second_road_number: This feature specifies the adjacent road if there is any. We observe a massive spike at 1. No further use will be made as first_road_number gives a good indication of the road on a categorical level.pedestrian_crossing_human_control: Gives insight into the controlled passing of pedestrians. Most cases are 0 which correlates to none. This can be to observe the rates for pedestrian casualties.pedestrian_crossing_physical_facilities: same as for the latter this can be used to observe the rates and situation per incident displaying use of crosswalk, lights etc.light_conditions: We observe most incidents occuring during 1: Daylight. With a spike at 4: Darkness - lights litweather_conditions: Most incidents occuring either during 1: Fine, no high winds and 2: Raining, no high windsroad_surface_conditions: Spike at 1: Dry and 2: Damp. this correlates to weather_conditionsspecial_conditions_at_site: Speecifies the irregularities involving infrastructure. Most cases are 0: Unknown We could observe the death rates that do involve the small amount of deadly incidents.carriegeway_hazards: Same as latter, small amount of known datapoints. The rates can be observed of the available cases.urban_or_rural_area: We observe most incidents occur in 1: Urban areas. This correlates to the high rates per case in police_forcedid_police_officer_attend_scene_of_accident: A spike at 1 indicating in most cases and officer did visit the scene. Could this be correlated to the rate of death?trunk_road_flag: Most cases being 2: Non-trunk meaning the roads are not controlled by the Highways of England organisation [4]We can display the results on a map to check the concentration using longitude and latitude.
fig = px.scatter_mapbox(characteristics,
lat='latitude',
lon='longitude',
zoom=4.9,
height=800,
width=800)
fig.data[0]['marker'].update(color='red')
fig.data[0]['marker'].update(size=3)
fig.update_layout(mapbox_style='open-street-map')
fig.update_layout(margin={'r':0,'t':0,'l':0,'b':0})
fig.show()
What is not visible on this map is the actual distribution of some of these features. So we can take a further look at some interests. We see a high concentration of cases near London. Which would correlate to police force Metropolitan Police. Lets check the distribution of cases:
characteristics['police_force'].value_counts()[:5]
1 20906 20 3933 99 3836 46 3405 47 3107 Name: police_force, dtype: int64
Here we can indeed see number 1 having most cases. Through the reference table we can find that this is the Metropolitan Police
police = characteristics['police_force'].map({1: 'Metropolitan Police',
20: 'West Midlands',
99: 'Police Scotland',
46: 'Kent',
47: 'Sussex',
})
fig = px.histogram(police,
title='Accidents per police department',
nbins=6)
fig.update_layout(bargap=0.3)
fig.show()
Here we can see the top police forces with most incidents. At first on the map it is not visible how big of a difference in cases there actually is. But the more you zoom in, the higher the concentration becomes. It would be interesting to see what the severity rates are.
severity = characteristics['accident_severity'].map({1: 'Fatal',
2: 'Serious',
3: 'Slight',
})
fig = px.pie(names=severity, color=severity,
title='Accident severity rate in accidents UK 2020',
color_discrete_map={
'Slight': 'darkgreen',
'Serious': 'orange',
'Fatal': 'red'
})
fig.update_traces(textposition='outside', textinfo='percent+label')
fig.show()
We observe that only a small amount of cases end in fatality. This seems fair as there are hundreds of small accidents each day. And not many of those end in fatality. Another interest is the distribution of these cases per police force. What would be expected is that the highest ranking areas have the highest cases of death.
police = characteristics['police_force'].map({1: 'Metropolitan Police',
20: 'West Midlands',
99: 'Police Scotland',
46: 'Kent',
47: 'Sussex',
})
fig = px.histogram(police, color=severity,
title='Accidents severity per top 5 police department',
nbins=6,
color_discrete_map={
'Slight': 'darkgreen',
'Serious': 'orange',
'Fatal': 'red'})
fig.update_layout(bargap=0.3)
fig.update_layout(barmode='group',xaxis={'categoryorder':'total descending'},)
fig.show()
Here we can observe that the Metropolitan police do not have the highest cases of deaths. This is in fact Police scotland. This could be linked to the fact that there are more dangerous roads in scotland. Compared to city streets in London and the surrounding areas.
roadType = characteristics['road_type'].map({1: 'Roundabout',
2: 'One way street',
3: 'Dual carriageway',
6: 'Single carriageway',
7: 'Slip road',
9: 'Unknown',
12: 'One way street/Slip road',
-1: 'Data missing or out of range',
})
fig = px.histogram(roadType, color=severity,
title='Accidents per road type',
nbins=8,
color_discrete_map={
'Slight': 'darkgreen',
'Serious': 'orange',
'Fatal': 'red',
},
)
fig.update_layout(barmode='group',
bargap=0.3,
xaxis={'categoryorder':'total descending'},
)
fig.show()
Here we further see that most cases happen on single carriegeways. These usually are highly concentrated road networks connecting various places.
fig = px.histogram(characteristics[characteristics['number_of_casualties'] < 14 ]['number_of_casualties'],
title='Number of casualties per accident',
nbins=20)
fig.update_layout(barmode='relative', bargap=0.3,xaxis={'categoryorder':'total descending'},)
fig.show()
Finally we can confirm our theory that the characteristics are linked to one or more casualties. These tables can later be merged and the data can be satisfied with both features.
In the end we saw some features which can be better described with another, or are not usable because of their lack of information or correlation.
time, local_authority_ons_district, local_authority_highway, lsoa_of_accident_location, accident_year, location_easting_osgr, location_northing_osgr, second_road_class, second_road_number, road_surface_conditions, trunk_road_flag
for col in casualty.select_dtypes('object'):
print('/n')
print('Number of values in "', col, '"', {casualty[col].nunique()})
print(casualty[col].unique())
print('/n')
print('------------------------------------------------')
/n
Number of values in " accident_reference " {91199}
['010219808' '010220496' '010228005' ... '991030297' '991030900'
'991032575']
/n
------------------------------------------------
Here we observe only the accident_reference being an object. This will later be used to link casualties to the characteristics. The other features will be displayed and described in a group.
sns.set(font_scale = 1.5)
fig, ax = plt.subplots(8,2, figsize=(25, 110))
i=0
for col in casualty.select_dtypes(include=['float64','int64']):
sns.distplot(casualty[col],label=col,ax=ax[i//2][i%2])
i=i+1
fig.show()
accident_year, vehicle_reference, casualty_reference: Are general descriptive features.casualty_class: Describes the type of casualty involved, We see a spike at 1: Driver or rider.sex_of_casualty: Gender of involved casualty with a higher number of 1: Male.age_of_casualty: We observe the highest rates between ages 20 and 40.age_band_of_casualty: Categorized age data. Here we observe 6: 26-35 has the highest rates.casualty_severity: Correlates to accident_severity from characteristics but per case of casualtypedestrian_location: Categorical data for position of pedestrian during incident. With most cases being 0: unknown where no pedestrians were involved.pedestrian_movement: Same as above but for the intended movement.car_passenger: Most cases 0: Not passenger. there are more cases of casualties per event, so the same accident_reference might have multiple cases of casualtybus_or_coach_passenger: same as above but for busses or coachespedestrian_road_maintanance_worker: Large spike in 0: No/Not applicable and a little spike in 2: Not known. This dataset will be removed as there are little to no correlatable casescasualty_type: Occupancy of type of vehicle during incident. Largest spikes in 0: Pedestrian and 9: Car occupantcasualty_home_area_type: Residence of casualty with largest number in 1: Urban areacasualty_imd_decile: deprivation rate of individual [5] with a higher rate at 2: More deprived 10-20%severity = casualty['casualty_severity'].map({1: 'Fatal',
2: 'Serious',
3: 'Slight',
})
fig = px.pie(names=severity, color=severity,
title='Casualty severity rate in accidents UK 2020',
color_discrete_map={
'Slight': 'darkgreen',
'Serious': 'orange',
'Fatal': 'red'
})
fig.update_traces(textposition='outside', textinfo='percent+label')
fig.show()
Here we actually observe a lower count of fatal accidents. This can be to the fact that the accident_severity is based on the grouped outcome of the accidents. and casualty_severity is the actual outcome per casualty. This would explain the high amount of slight severity and lower amount of fatal.
accident_year, pedestrian_road_maintanance_worker
for col in vehicles.select_dtypes('object'):
print('/n')
print('Number of values in "', col, '"', {vehicles[col].nunique()})
print(vehicles[col].unique())
print('/n')
print('------------------------------------------------')
/n
Number of values in " accident_reference " {91200}
[10219808 10220496 10228005 ... 991030297 991030900 991032575]
/n
------------------------------------------------
/n
Number of values in " generic_make_model " {692}
['AUDI Q5' 'AUDI A1' '-1' 'TOYOTA PRIUS' 'BMW 4 SERIES' 'VOLVO V60'
'MERCEDES C CLASS' 'JAGUAR XF SERIES' 'BMW X5' 'YAMAHA XC115'
'BMW 1 SERIES' 'PORSCHE CAYENNE' 'AUDI A4' 'VOLKSWAGEN CADDY' 'HONDA CRV'
'VAUXHALL CORSA' 'VOLKSWAGEN POLO' 'VOLKSWAGEN GOLF' 'BMW 5 SERIES'
'FIAT 500' 'MAZDA 323' 'HONDA INSIGHT' 'TOYOTA AYGO' 'TOYOTA YARIS'
'JAGUAR X TYPE' 'PEUGEOT TWEET 125' 'VOLVO MODEL MISSING' 'NISSAN NOTE'
'FORD KA' 'TOYOTA COROLLA' 'TOYOTA MODEL MISSING' 'MITSUBISHI OUTLANDER'
'FORD FOCUS' 'MERCEDES A CLASS' 'SMART FORTWO' 'AUDI A6' 'VAUXHALL MOKKA'
'HONDA GLR 125' 'LAND ROVER RANGE ROVER' 'BMW 6 SERIES' 'FIAT DUCATO'
'FORD FIESTA' 'PEUGEOT 108' 'ALEXANDER DENNIS MODEL MISSING'
'VOLKSWAGEN TRANSPORTER' 'CITROEN C4' 'FORD TRANSIT' 'PEUGEOT EXPERT'
'HONDA WW125' 'BMW M3' 'WRIGHTBUS NB4L' 'FORD TRANSIT CONNECT' 'AUDI A3'
'MERCEDES B CLASS' 'FORD KUGA' 'HONDA WW125EX2' 'HONDA SH125'
'PEUGEOT 107' 'HONDA NSC110E' 'RENAULT CLIO' 'TOYOTA PREVIA'
'SUZUKI WAGON R+' 'MAZDA 3' 'VOLKSWAGEN TIGUAN' 'KIA SORENTO'
'MERCEDES GLA CLASS' 'YAMAHA XJ6' 'KIA NIRO' 'PIAGGIO FLY'
'SEAT ALHAMBRA' 'BMW 3 SERIES' 'MINI ROADSTER' 'LONDON TAXIS INT. TX4'
'PIAGGIO VESPA' 'YAMAHA MT125' 'SAAB 9-3' 'MAZDA 2' 'AUDI SQ5'
'VAUXHALL ASTRA' 'PIAGGIO ZIP' 'HONDA NSC110MP' 'MERCEDES SPRINTER'
'SCANIA MODEL MISSING' 'HONDA SH 125' 'SUZUKI SWIFT' 'JAGUAR F-PACE'
'FORD TRANSIT CUSTOM' 'VAUXHALL VECTRA' 'NISSAN MODEL MISSING'
'MERCEDES GLC CLASS' 'RENAULT MEGANE' 'VOLKSWAGEN UP BY'
'VOLKSWAGEN SHARAN' 'MERCEDES VITO' 'RENAULT SCENIC' 'MERCEDES E CLASS'
'VOLKSWAGEN TOURAN' 'HONDA NSC110WH' 'TOYOTA AURIS' 'FIAT PUNTO'
'VOLKSWAGEN EOS' 'NISSAN X-TRAIL' 'PEUGEOT 208' 'NISSAN QASHQAI'
'SEAT IBIZA' 'SUZUKI RV 125' 'RENAULT CAPTUR' 'YAMAHA GPD 125'
'LEXUS RX400' 'PEUGEOT 207' 'BMW 2 SERIES' 'PIAGGIO TYPHOON'
'MINI COOPER' 'MINI ONE' 'HYUNDAI I30' 'VAUXHALL COMBO' 'HONDA NHX110WH'
'MERCEDES ML CLASS' 'LEXUS IS250' 'VAUXHALL INSIGNIA' 'RENAULT MASTER'
'CHEVROLET CAPTIVA' 'ALFA ROMEO GIULIA' 'VOLKSWAGEN UP' 'TOYOTA AVENSIS'
'VAUXHALL ZAFIRA' 'TOYOTA IQ' 'NISSAN JUKE'
'HARLEY-DAVIDSON MODEL MISSING' 'IVECO DAILY' 'PEUGEOT 2008'
'PEUGEOT 308' 'VAUXHALL VIVARO' 'WRIGHTBUS GEMINI' 'VAUXHALL ADAM'
'AUDI Q3' 'HONDA FES125' 'FORD MONDEO' 'MINI COUNTRYMAN' 'PEUGEOT 206'
'DENNIS MODEL MISSING' 'TOYOTA VERSO' 'HONDA CIVIC' 'MERCEDES CLA CLASS'
'NISSAN MICRA' 'BMW X1' 'FORD GALAXY' 'MERCEDES CLS CLASS'
'TRIUMPH BONNEVILLE' 'HONDA NSS 125 AD' 'TOYOTA PRIUS+' 'PIAGGIO LIBERTY'
'FORD FUSION' 'MERCEDES R CLASS' 'AUDI A5' 'YAMAHA NXC 125'
'HONDA NSC 110' 'SKODA YETI' 'VOLKSWAGEN PASSAT' 'CITROEN C1'
'SMART FORFOUR' 'BMW I3' 'YAMAHA MT09' 'KIA CEED' 'BMW M4' 'KIA OPTIMA'
'MAZDA CX5' 'HONDA CBF125M' 'ABARTH 500' 'AUDI TT'
'DAF TRUCKS MODEL MISSING' 'WRIGHTBUS STREETLITE' 'HONDA CBR900RR'
'PORSCHE MACAN' 'HONDA JAZZ' 'FORD MODEL MISSING' 'PORSCHE 911' 'AUDI A8'
'GILERA RUNNER 125' 'SUZUKI SV 650' 'HONDA ACCORD' 'TOYOTA RAV4'
'MERCEDES CLC CLASS' 'KAWASAKI ZX600' 'BMW X3' 'HARLEY-DAVIDSON FLS'
'NISSAN ALMERA' 'SMART CITY' 'YAMAHA MODEL MISSING' 'DODGE CALIBER'
'YAMAHA X-MAX 125' 'YAMAHA YZF R125' 'YAMAHA XC 125' 'AUDI A7'
'HONDA NC 750' 'NISSAN NAVARA' 'ALFA ROMEO GIULIETTA'
'VOLKSWAGEN SCIROCCO' 'MITSUBISHI SHOGUN' 'MERCEDES S CLASS' 'FORD C-MAX'
'RENAULT KANGOO' 'YAMAHA TRACER' 'LAND ROVER DISCOVERY'
'VOLKSWAGEN URBAN' 'PEUGEOT 407' 'FORD ECOSPORT' 'CITROEN BERLINGO'
'CHEVROLET MATIZ' 'NISSAN PULSAR' 'BMW 7 SERIES' 'PIAGGIO MP3'
'HONDA HR-V' 'AUDI Q2' 'PEUGEOT BIPPER' 'HONDA NC750XA' 'SKODA FABIA'
'SKODA OCTAVIA' 'HONDA SH 125 AD' 'VOLVO V70' 'MERCEDES MODEL MISSING'
'JEEP COMPASS' 'HONDA PES 125' 'PEUGEOT 307' 'LAND ROVER DEFENDER'
'HONDA NT700V' 'ISUZU TRUCKS FORWARD' 'VOLKSWAGEN T-ROC' 'PEUGEOT BOXER'
'VOLVO XC90' 'SEAT LEON' 'HONDA NSS' 'KIA PICANTO' 'MINI PACEMAN'
'KIA SPORTAGE' 'YAMAHA FZ6' 'PEUGEOT 5008' 'CITROEN DS5' 'HYUNDAI I10'
'YAMAHA YP 125' 'LEVC TX' 'SUZUKI GSXR 600' 'PEUGEOT 3008'
'IVECO EUROCARGO' 'VOLVO S60' 'CITROEN RELAY' 'MAN MODEL MISSING'
'FORD RANGER' 'VOLKSWAGEN CRAFTER' 'HYUNDAI IONIQ' 'FORD TOURNEO'
'TOYOTA C-HR' 'RENAULT KADJAR' 'KAWASAKI EX250' 'SKODA CITIGO'
'CITROEN SAXO' 'PORSCHE PANAMERA' 'YAMAHA FZ1' 'LAND ROVER FREELANDER'
'PEUGEOT PARTNER' 'VAUXHALL ASTRAVAN' 'RENAULT LAGUNA' 'MERCEDES CLK'
'MINI CLUBMAN' 'LONDON TAXIS INT. TXII' 'HONDA CBF125N' 'YAMAHA YBR 125'
'AUDI A2' 'CITROEN DISPATCH' 'VAUXHALL CROSSLAND' 'TRIUMPH TIGER'
'BMW R 1200' 'VAUXHALL AGILA' 'MITSUBISHI ASX' 'VAUXHALL MERIVA'
'KIA SOUL' 'HONDA CBR600RR' 'SKODA SUPERB' 'BENTLEY CONTINENTAL'
'ABARTH 595' 'HONDA MODEL MISSING' 'LEXUS IS220' 'VDL MODEL MISSING'
'YAMAHA X-MAX 300' 'NISSAN LEAF' 'DS DS3' 'MINI FIRST' 'JAGUAR I-PACE'
'FORD GRAND C-MAX' 'SUZUKI VITARA' 'CITROEN DS3' 'FORD S-MAX'
'CHEVROLET ORLANDO' 'LEXUS UX 250H' 'HONDA C90' 'JAGUAR S TYPE'
'RENAULT MODUS' 'SEAT TOLEDO' 'HONDA SH300A' 'BMW X4' 'VOLVO V90'
'SUZUKI MODEL MISSING' 'LEXUS CT 200' 'AUDI S5' 'SKODA KAROQ'
'CITROEN C2' 'FIAT DOBLO' 'MERCEDES VIANO' 'OPTARE MODEL MISSING'
'MAZDA 6' 'SUZUKI UK 110' 'HONDA ANC125E' 'YAMAHA YZF R1'
'PORSCHE BOXSTER' 'LEXUS IS300' 'TRIUMPH STREET TRIPLE' 'NISSAN E-NV200'
'VOLVO S40' 'NISSAN NV200' 'MAZDA 5' 'VAUXHALL MOVANO' 'KIA VENGA'
'FIAT PANDA' 'SUZUKI IGNIS' 'HYUNDAI I20' 'HONDA CBR125R' 'VOLVO V40'
'CHEVROLET SPARK' 'HYUNDAI I40' 'HYOSUNG GT 125'
'RENAULT TRUCKS MODEL MISSING' 'MINI JOHN COOPER WORKS' 'RENAULT ZOE'
'LEXMOTO LXR' 'CITROEN C3' 'SYM CROX 125' 'LEXMOTO DIABLO 125'
'PORSCHE 718' 'MERCEDES CLK CLASS' 'IVECO STRALIS' 'MERCEDES GLE CLASS'
'VOLKSWAGEN BEETLE' 'FIAT TIPO' 'LONGJIA LJ' 'LEXUS NX300' 'YAMAHA X-MAX'
'NISSAN PIXO' 'AUDI Q7' 'RENAULT TRAFIC' 'HONDA FR-V' 'SUZUKI GZ 125'
'MERCEDES AMG CLASS' 'AUDI S3' 'HONDA CB600F' 'HYUNDAI IX20' 'VOLVO XC60'
'BMW X6' 'VOLKSWAGEN TOUAREG' 'KYMCO AGILITY' 'HONDA ANC125'
'HYUNDAI SANTA FE' 'SYM JET 125' 'HONDA XL125V' 'DUCATI 899 PANIGALE'
'IVECO MODEL MISSING' 'SEAT ALTEA' 'RENAULT GRAND SCENIC' 'HONDA ST1300A'
'BMW M1' 'KAWASAKI ZR900' 'VOLVO XC40' 'MITSUBISHI LANCER' 'KTM 125 DUKE'
'CITROEN XSARA' 'FORD B-MAX' 'NISSAN PRIMERA' 'HONDA ANF125'
'JAGUAR XK SERIES' 'LEXUS RX450' 'SKODA KODIAQ' 'SUZUKI SPLASH'
'JAGUAR XE SERIES' 'MITSUBISHI FUSO CANTER' 'BMW Z4' 'JEEP CHEROKEE'
'TOYOTA HILUX' 'VOLKSWAGEN JETTA' 'HONDA CB500XA' 'VOLKSWAGEN LUPO'
'KIA SEDONA' 'HYUNDAI TUCSON' 'SUZUKI GSX 1300' 'NISSAN NV400'
'CITROEN C5' 'HONDA CBF600N' 'PIAGGIO MEDLEY 125' 'KIA RIO'
'HONDA MSX125' 'YAMAHA MT07' 'SKODA ROOMSTER' 'HONDA CBR 650 FA'
'SUZUKI GSF 650' 'KAWASAKI MODEL MISSING' 'QINGQI QM 125 GY'
'VOLKSWAGEN BORA' 'YAMAHA MW125' 'KIA PRO CEED' 'SUZUKI GSF 600'
'YAMAHA DELIGHT 125' 'KAWASAKI ZR750' 'CHRYSLER YPSILON'
'MITSUBISHI MIRAGE' 'MERCEDES SLK CLASS' 'SUZUKI GSF 1200'
'KAWASAKI KLE650' 'FORD TRANSIT COURIER' 'HONDA VFR800F' 'TOYOTA MR2'
'LEXUS RX300' 'CHRYSLER VOYAGER' 'HONDA CBR1000RR' 'MERCEDES V CLASS'
'YAMAHA YS 125' 'HYUNDAI GETZ' 'MITSUBISHI COLT' 'HYUNDAI MATRIX'
'TOYOTA PROACE' 'AUDI RS4' 'DACIA DUSTER' 'VOLVO V50' 'ALFA ROMEO MITO'
'DACIA LOGAN' 'VAUXHALL GRANDLAND' 'SEAT TARRACO' 'RENAULT TRUCKS MASTER'
'LAND ROVER MODEL MISSING' 'SKODA RAPID' 'CHEVROLET CRUZE'
'SSANGYONG RODIUS' 'FORD MUSTANG' 'YAMAHA MWS125' 'NISSAN PRIMASTAR'
'HONDA AFS1102SH' 'BMW S 1000' 'AUDI S4' 'VOLVO S80'
'RENAULT MODEL MISSING' 'KAWASAKI ZR800' 'HONDA CB600 HORNET'
'SUZUKI UH 125' 'ZONTES ZT 125' 'JAGUAR E-PACE' 'YAMAHA FJR 1300'
'KIA CARENS' 'KAWASAKI EX 650' 'BMW F 800' 'JEEP RENEGADE'
'HYUNDAI ACCENT' 'HYUNDAI KONA' 'HONDA CB1000' 'KAWASAKI ZX1000'
'HONDA CR-V' 'CHEVROLET LACETTI' 'KIA STONIC' 'MAZDA CX3' 'SEAT ATECA'
'BMW M2' 'RENAULT TWINGO' 'SUZUKI SX4' 'ALFA ROMEO 147' 'YAMAHA YZF R6'
'MAZDA MX-5' 'HONDA NSC50WH' 'VOLKSWAGEN CC' 'MERCEDES SL CLASS'
'HONDA CBR500' 'KAWASAKI EX650' 'SUZUKI GSR 750' 'TESLA MODEL 3'
'PEUGEOT KISBEE' 'QINGQI QM 125' 'BYD ENVIRO' 'HONDA CB500'
'LEXMOTO ISCA 125' 'DUCATI MULTISTRADA 1200' 'SUZUKI AN 400'
'MAZDA MODEL MISSING' 'KAWASAKI BR125' 'HYUNDAI IX35' 'BMW M5' 'ROVER 25'
'TESLA MODEL S' 'LEXMOTO XFLM 125' 'MITSUBISHI MODEL MISSING'
'BMW R 1250' 'VAUXHALL VIVA' 'SUZUKI GSF 1250' 'HONDA CB650'
'UM RENEGRADE' 'BMW X2' 'VOLKSWAGEN ARTEON' 'PEUGEOT HORIZON'
'FIAT BRAVO' 'SUZUKI GSXR 1000' 'HONDA SES125' 'VOLVO C 30'
'SUZUKI GSXS 1100' 'FORD KA+' 'HONDA CBR600F' 'BMW F 750'
'ALFA ROMEO 159' 'YAMAHA XT 600' 'SYM SYMPHONY' 'LEXUS IS200'
'NISSAN CABSTAR' 'KTM RC 125' 'VAUXHALL TIGRA' 'CHEVROLET AVEO'
'TOYOTA LANDCRUISER' 'TOYOTA HIACE' 'FIAT 500X' 'DACIA SANDERO'
'PEUGEOT 406' 'YAMAHA XT 125' 'SUZUKI JIMNY' 'DAIHATSU SIRION'
'FIAT SCUDO' 'LEXUS GS300' 'LEXMOTO HUNTER' 'MITSUBISHI ECLIPSE'
'KEEWAY RKS 125' 'CHRYSLER 300' 'SUZUKI GSXR 750' 'SUZUKI ALTO'
'LEXMOTO ECHO 50' 'TRIUMPH SPEED MASTER 865' 'AJS MODENA 125'
'VOLKSWAGEN MODEL MISSING' 'HONDA NSC50E' 'FORD STREETKA' 'CITROEN NEMO'
'DS DS4' 'BMW Z3' 'LAMBRETTA MODEL MISSING' 'HONDA NSS300A' 'YAMAHA R6'
'YAMAHA DT 125 R' 'SUZUKI DL 650' 'LEXUS GS 450' 'MITSUBISHI GRANDIS'
'JAGUAR XJ SERIES' 'JEEP GRAND CHEROKEE' 'HONDA CBR1100XX'
'RENAULT KOLEOS' 'TOYOTA GT86' 'FORD EDGE' 'HONDA CG125' 'VOLVO XC70'
'SUBARU IMPREZA' 'ROVER 75' 'BENELLI TORNADO' 'LEXMOTO TITAN 125'
'SINNIS ZS 125' 'LEXMOTO VIPER 125' 'VAUXHALL ANTARA' 'KAWASAKI ZX636'
'ISUZU TROOPER' 'ROVER 45' 'KEEWAY RK 125' 'KAWASAKI ZR1000'
'YAMAHA MT10' 'SUZUKI CELERIO' 'AUDI Q8' 'MG XS' 'FORD PUMA'
'WRIGHTBUS STREETDECK' 'MCC SMART' 'DAEWOO KALOS' 'FIAT SEICENTO'
'KTM 390 DUKE' 'VOLVO S90' 'NISSAN PATHFINDER' 'HYUNDAI COUPE'
'FORD ESCORT' 'SEAT MII' 'AUDI TTS' 'LEXMOTO VENOM 125' 'PEUGEOT 106'
'SYM JET' 'KIA XCEED' 'LEXMOTO XTR 125' 'KEEWAY SUPERLIGHT' 'SEAT ARONA'
'KTM 1290 SUPER ADVENTURE' 'TOYOTA CELICA' 'KAWASAKI ER650'
'KAWASAKI EN650' 'YAMAHA WR 125' 'VOLKSWAGEN T-CROSS' 'PEUGEOT 508'
'JAGUAR F TYPE' 'PEUGEOT 807' 'INFINITI Q30' 'BMW MODEL MISSING'
'GILERA RUNNER' 'HARLEY-DAVIDSON XL 883' 'SUZUKI BALENO' 'HYUNDAI AMICA'
'SUZUKI GSXR 1100' 'HONDA CBR500RA' 'HARLEY-DAVIDSON XL 1200'
'JEEP WRANGLER' 'TRIUMPH SPEED TRIPLE' 'AUDI RS6' 'KTM 1290 SUPERDUKE'
'LEXMOTO ZSX 125' 'LEXMOTO ASSAULT 125' 'SINNIS RSX 125' 'LEXMOTO ENIGMA'
'AUDI RS3' 'KEEWAY TX 125' 'TESLA MODEL X' 'YAMAHA XJR 1300'
'SUZUKI GSX 650' 'CITROEN DS4' 'SAAB 9-5' 'KTM 790 DUKE' 'FIAT FIORINO'
'YAMAHA FZS 600' 'VOLVO C70' 'APRILIA MODEL MISSING' 'APRILIA SR 50'
'LDV MAXUS' 'VOLKSWAGEN FOX' 'TOYOTA STARLET' 'KAWASAKI ZX1400'
'TRIUMPH DAYTONA 675' 'MG ZS' 'SUBARU LEGACY' 'CHRYSLER PT CRUISER'
'KAWASAKI BX125' 'MITSUBISHI CARISMA' 'MAN TGE' 'TRIUMPH SPRINT ST 1050'
'CHEVROLET KALOS' 'APRILIA RS 125' 'PEUGEOT SPEEDFIGHT' 'KAWASAKI ZX900'
'PEUGEOT MODEL MISSING' 'HYUNDAI I800' 'PULSE XF 125'
'JOHN DEERE 6100 SERIES' 'SUBARU OUTBACK' 'APRILIA TUONO'
'TRIUMPH ROCKET' 'CASE IH MODEL MISSING' 'VAUXHALL SIGNUM'
'VALTRA MODEL MISSING' 'BENELLI BN' 'OPTARE SOLO' 'CLAAS MODEL MISSING'
'MAZDA RX8' 'MASSEY FERGUSON MODEL MISSING' 'SEAT EXEO' 'MG 3'
'OPTARE VERSA' 'SEAT AROSA' 'MG ZR' 'JOHN DEERE MODEL MISSING'
'SSANGYONG KORANDO' 'LEXMOTO ZSB 125' 'PEUGEOT 306'
'TRIUMPH MODEL MISSING' 'CITROEN C-CROSSER' 'YAMAHA YQ 50 AEROX'
'PEUGEOT RCZ' 'TRIUMPH SPRINT ST' 'MITSUBISHI SPACE STAR'
'TRIUMPH SPEED TRIPLE 1050' 'SUBARU FORESTER' 'DAIHATSU TERIOS'
'JCB MODEL MISSING' 'HYUNDAI VELOSTER' 'NEW HOLLAND MODEL MISSING'
'NISSAN 350' 'ISUZU D-MAX' 'FENDT MODEL MISSING' 'JOHN DEERE 6200 SERIES'
'SUZUKI GS 500' 'LEXMOTO MICHIGAN' 'TRIUMPH THUNDERBIRD'
'SSANGYONG TIVOLI']
/n
------------------------------------------------
For vehicles we observe again the accident_reference. And also the model of car. This could be an interesting feature for car manufacturers who would like to see which models have the highest fatality rates. As we saw before this feature contains 28% of datapoints that are null. These might be cases with pedestrains though, so lets take a look at that.
sns.set(font_scale = 1.5)
plt.figure(figsize=(25, 50))
plt.title('Number of accidents in 2020 per police force')
sns.countplot(y=vehicles['generic_make_model'], order=pd.value_counts(vehicles['generic_make_model']).iloc[:10].index)
plt.xlabel("Number of accidents")
plt.ylabel("Police Force")
plt.show()
As expected. We can see that a lot of datapoints are -1. We can use vehicle_reference from casualty after concatenating. to check the distribution. For now here is the remaining features.
sns.set(font_scale = 1.5)
fig, ax = plt.subplots(12,2, figsize=(25, 110))
i=0
for col in vehicles.select_dtypes(include=['float64','int64']):
sns.distplot(vehicles[col],label=col,ax=ax[i//2][i%2])
i=i+1
fig.show()
accident_year, vehicle_reference: Are descriptive features. vehicle_reference can be used to cross reference with casualtyvehicle_type: Type of vehicle involved in the accident correlates to casualty_type from casualtytowing_and_articulation: specifies if the vehicles has been towing an object. With almost all data being 0: No tow/ articulation this data can still be used with the small instances of available towing because there is no missing data.vehicle_manoeuvre: type of manouevre the vehicle was performing with a spike at 18: Going ahead/ othervehicle_direction_from: Direction of vehicle from impact perspective.vehicle_direction_to: Direction of impact from vehicle perspective.vehicle_location_restricted_lane: Specifies if the car was in a restricted lane. most cases being 0: City limits, not restricted. small amount of cases are 99: Unknown/self-reportedjunction_location: Movement of the vehicle towards a junction with most cases 0: Not at junctionskidding_and_overturning: Specifies if skidding was involved in the accident. This could correlated to road_surface_conditions and weather_conditions. With most cases 0: Nonehit_object_in_carriegeway: Specifies if what type object has been hit during occurence of the incident with most cases 0: Nonevehicle_leaving_carriegeway: Specifies the movement of the vehicle after the impact. with most cases 0: DId not leave carriegewayhit_object_off_carriegeway: Specifies what object has been hit off of the carriegeway by the vehicle.first_point_of_impact: What area of the vehicle has been hit first. most cases being 1: Front with some cases being 0: No impact at allvehicle_left_hand_drive: Most cases in the UK will involve left hand drive. 1: No and some cases are 9: unknownjourney_purpose_of_driver: Most cases being 6: Not knownsex_of_driver: Correlates to sex_of_casualty from casualty with some cases 3: Unknown this could be because of hit and runs.age_of_driver: Large spike at -1: Data missingage_band_of_driver: Also large spike for missing data, understandable because it correlates to the feature above.engine_capacity_cc: Also large spike at -1 for missing datapropulsion: also large spike for missing dataage_of_vehicle: also a large spike in missing datadriver_imd_decile: also large spike in missing data casualty_imd_decile if casualty was driverdriver_home_area_type: correlates to casualty_home_area_type if casualty was driveraccident_year
A better look needs to be taken at some features after concatenation:
sex_of_driver, age_of_driver, age_band_of_driver, engine_capacity_cc,propulsion,age_of_vehicle
We had a look a the data. And found some irregularties. These will now be processed. As many features as possible have been kept for the best possible prediction. Later features with most importance will be displayed and furtherly selected. By applying a Random Forest Classifier for checking the stability of the data.
characteristics = pd.read_csv('/Users/Matt/Desktop/AI/CHALLENGE1/Brit/dft-road-casualty-statistics-accident-2020.csv')
casualty = pd.read_csv('/Users/Matt/Desktop/AI/CHALLENGE1/Brit/dft-road-casualty-statistics-casualty-2020.csv')
vehicles = pd.read_csv('/Users/Matt/Desktop/AI/CHALLENGE1/Brit/dft-road-casualty-statistics-vehicle-2020.csv')
#ref = pd.read_excel('/Users/Matt/Desktop/AI/CHALLENGE1/Brit/Road-Safety-Open-Dataset-Data-Guide.xlsx')
characteristics = characteristics.set_index('accident_reference')
casualty = casualty.set_index('accident_reference')
vehicles = vehicles.set_index('accident_reference')
#Dropping features
characteristics = characteristics.drop(['time',
'local_authority_ons_district',
'local_authority_highway',
'lsoa_of_accident_location',
'accident_year',
'location_easting_osgr',
'location_northing_osgr',
'second_road_class',
'second_road_number',
'road_surface_conditions',
'trunk_road_flag'], axis=1)
vehicles = vehicles.drop(['accident_year',
#'vehicle_reference'
], axis=1)
casualty = casualty.drop(['accident_year',
#'vehicle_reference',
#'casualty_reference',
'pedestrian_road_maintenance_worker'], axis=1)
pd.set_option('display.max_row',max(characteristics.shape[0],casualty.shape[0],vehicles.shape[0]))
pd.set_option('display.max_column',max(characteristics.shape[1],casualty.shape[1],vehicles.shape[1]))
characteristics.name = 'characteristics'
casualty.name = 'casualty'
vehicles.name = 'vehicles'
datasets = [characteristics,casualty,vehicles]
for df in datasets:
print ("The dataset",df.name,"has",df.shape[0],"rows and",df.shape[1],"columns")
The dataset characteristics has 91199 rows and 24 columns The dataset casualty has 115584 rows and 15 columns The dataset vehicles has 167375 rows and 25 columns
display(HTML('<h1>characteristics</h1>'))
display(characteristics.head())
display(HTML('<h1>vehicles</h1>'))
display(vehicles.head())
display(HTML('<h1>casualties</h1>'))
display(casualty.head())
| accident_index | longitude | latitude | police_force | accident_severity | number_of_vehicles | number_of_casualties | date | day_of_week | local_authority_district | first_road_class | first_road_number | road_type | speed_limit | junction_detail | junction_control | pedestrian_crossing_human_control | pedestrian_crossing_physical_facilities | light_conditions | weather_conditions | special_conditions_at_site | carriageway_hazards | urban_or_rural_area | did_police_officer_attend_scene_of_accident | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| accident_reference | ||||||||||||||||||||||||
| 10219808 | 2020010219808 | -0.254001 | 51.462262 | 1 | 3 | 1 | 1 | 04/02/2020 | 3 | 10 | 6 | 0 | 6 | 20 | 0 | -1 | 9 | 9 | 1 | 9 | 0 | 0 | 1 | 3 |
| 10220496 | 2020010220496 | -0.139253 | 51.470327 | 1 | 3 | 1 | 2 | 27/04/2020 | 2 | 9 | 3 | 3036 | 6 | 20 | 9 | 2 | 0 | 4 | 1 | 1 | 0 | 0 | 1 | 1 |
| 10228005 | 2020010228005 | -0.178719 | 51.529614 | 1 | 3 | 1 | 1 | 01/01/2020 | 4 | 1 | 5 | 0 | 6 | 30 | 3 | 1 | 0 | 0 | 4 | 1 | 0 | 0 | 1 | 1 |
| 10228006 | 2020010228006 | -0.001683 | 51.541210 | 1 | 2 | 1 | 1 | 01/01/2020 | 4 | 17 | 3 | 11 | 6 | 30 | 0 | -1 | 0 | 4 | 4 | 1 | 0 | 0 | 1 | 1 |
| 10228011 | 2020010228011 | -0.137592 | 51.515704 | 1 | 3 | 1 | 2 | 01/01/2020 | 4 | 1 | 3 | 40 | 6 | 30 | 3 | 4 | 0 | 0 | 4 | 1 | 0 | 0 | 1 | 1 |
| accident_index | vehicle_reference | vehicle_type | towing_and_articulation | vehicle_manoeuvre | vehicle_direction_from | vehicle_direction_to | vehicle_location_restricted_lane | junction_location | skidding_and_overturning | hit_object_in_carriageway | vehicle_leaving_carriageway | hit_object_off_carriageway | first_point_of_impact | vehicle_left_hand_drive | journey_purpose_of_driver | sex_of_driver | age_of_driver | age_band_of_driver | engine_capacity_cc | propulsion_code | age_of_vehicle | generic_make_model | driver_imd_decile | driver_home_area_type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| accident_reference | |||||||||||||||||||||||||
| 10219808 | 2020010219808 | 1 | 9 | 9 | 5 | 1 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 9 | 6 | 2 | 32 | 6 | 1968 | 2 | 6 | AUDI Q5 | 4 | 1 |
| 10220496 | 2020010220496 | 1 | 9 | 0 | 4 | 2 | 6 | 0 | 2 | 0 | 0 | 0 | 0 | 1 | 1 | 2 | 1 | 45 | 7 | 1395 | 1 | 2 | AUDI A1 | 7 | 1 |
| 10228005 | 2020010228005 | 1 | 9 | 0 | 18 | -1 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 6 | 3 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
| 10228006 | 2020010228006 | 1 | 8 | 0 | 18 | 1 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 44 | 7 | 1798 | 8 | 8 | TOYOTA PRIUS | 2 | 1 |
| 10228011 | 2020010228011 | 1 | 9 | 0 | 18 | 3 | 7 | 9 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 6 | 1 | 20 | 4 | 2993 | 2 | 4 | BMW 4 SERIES | -1 | -1 |
| accident_index | vehicle_reference | casualty_reference | casualty_class | sex_of_casualty | age_of_casualty | age_band_of_casualty | casualty_severity | pedestrian_location | pedestrian_movement | car_passenger | bus_or_coach_passenger | casualty_type | casualty_home_area_type | casualty_imd_decile | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| accident_reference | |||||||||||||||
| 010219808 | 2020010219808 | 1 | 1 | 3 | 1 | 31 | 6 | 3 | 9 | 5 | 0 | 0 | 0 | 1 | 4 |
| 010220496 | 2020010220496 | 1 | 1 | 3 | 2 | 2 | 1 | 3 | 1 | 1 | 0 | 0 | 0 | 1 | 2 |
| 010220496 | 2020010220496 | 1 | 2 | 3 | 2 | 4 | 1 | 3 | 1 | 1 | 0 | 0 | 0 | 1 | 2 |
| 010228005 | 2020010228005 | 1 | 1 | 3 | 1 | 23 | 5 | 3 | 5 | 9 | 0 | 0 | 0 | 1 | 3 |
| 010228006 | 2020010228006 | 1 | 1 | 3 | 1 | 47 | 8 | 2 | 4 | 1 | 0 | 0 | 0 | 1 | 3 |
For this we will observe a case with 2 casualties and 13 vehicles. There will be some features of importance.
vehicles['accident_index'].value_counts().head()
2020430342787 13 2020461001195 12 202006L168994 11 2020622000862 11 2020410926545 10 Name: accident_index, dtype: int64
characteristics[characteristics['accident_index'].isin(['2020430342787'])]
| accident_index | longitude | latitude | police_force | accident_severity | number_of_vehicles | number_of_casualties | date | day_of_week | local_authority_district | first_road_class | first_road_number | road_type | speed_limit | junction_detail | junction_control | pedestrian_crossing_human_control | pedestrian_crossing_physical_facilities | light_conditions | weather_conditions | special_conditions_at_site | carriageway_hazards | urban_or_rural_area | did_police_officer_attend_scene_of_accident | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| accident_reference | ||||||||||||||||||||||||
| 430342787 | 2020430342787 | -0.698147 | 51.603779 | 43 | 2 | 13 | 2 | 25/10/2020 | 1 | 480 | 1 | 40 | 3 | 70 | 0 | -1 | 0 | 0 | 6 | 1 | 0 | 0 | 1 | 1 |
Based on the accident_index all vehicle datapoints can be satisfied with the characteristics data. number_of_vehicles is the check for this.
vehicles[vehicles['accident_index'].isin(['2020430342787'])]
| accident_index | vehicle_reference | vehicle_type | towing_and_articulation | vehicle_manoeuvre | vehicle_direction_from | vehicle_direction_to | vehicle_location_restricted_lane | junction_location | skidding_and_overturning | hit_object_in_carriageway | vehicle_leaving_carriageway | hit_object_off_carriageway | first_point_of_impact | vehicle_left_hand_drive | journey_purpose_of_driver | sex_of_driver | age_of_driver | age_band_of_driver | engine_capacity_cc | propulsion_code | age_of_vehicle | generic_make_model | driver_imd_decile | driver_home_area_type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| accident_reference | |||||||||||||||||||||||||
| 430342787 | 2020430342787 | 1 | 9 | 0 | 18 | 7 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 1 | 1 | 1 | 40 | 7 | 1797 | 8 | 7 | TOYOTA PRIUS | 4 | 1 |
| 430342787 | 2020430342787 | 2 | 9 | 0 | 18 | 7 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 1 | 6 | 1 | 32 | 6 | 1422 | 2 | 16 | VOLKSWAGEN POLO | 5 | 1 |
| 430342787 | 2020430342787 | 3 | 9 | 0 | 18 | 7 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 1 | 6 | 1 | 29 | 6 | 2143 | 2 | 7 | MERCEDES C CLASS | 4 | 1 |
| 430342787 | 2020430342787 | 4 | 9 | 0 | 18 | 7 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 1 | 6 | 1 | 26 | 6 | 1596 | 1 | 11 | FORD FOCUS | 4 | 1 |
| 430342787 | 2020430342787 | 5 | 9 | 0 | 18 | 7 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 1 | 6 | 1 | 33 | 6 | -1 | -1 | -1 | -1 | 8 | 1 |
| 430342787 | 2020430342787 | 6 | 9 | 0 | 18 | 7 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 1 | 6 | 1 | 25 | 5 | 1598 | 2 | 9 | MINI COOPER | 8 | 3 |
| 430342787 | 2020430342787 | 7 | 9 | 0 | 18 | 7 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 1 | 6 | 1 | 36 | 7 | 999 | 1 | 0 | AUDI A3 | -1 | -1 |
| 430342787 | 2020430342787 | 8 | 9 | 0 | 18 | 7 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 1 | 6 | 1 | 56 | 9 | 1999 | 2 | 3 | JAGUAR XE SERIES | 7 | 1 |
| 430342787 | 2020430342787 | 9 | 9 | 0 | 18 | 7 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 1 | 6 | 1 | 37 | 7 | 1499 | 2 | 5 | FORD FOCUS | 5 | 1 |
| 430342787 | 2020430342787 | 10 | 9 | 0 | 18 | 7 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 1 | 6 | 1 | 36 | 7 | 1498 | 1 | 3 | VOLKSWAGEN GOLF | 3 | 1 |
| 430342787 | 2020430342787 | 11 | 9 | 0 | 18 | 7 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 1 | 6 | 1 | 32 | 6 | 1461 | 2 | 5 | RENAULT CLIO | 5 | 3 |
| 430342787 | 2020430342787 | 12 | 9 | 0 | 18 | 7 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 1 | 6 | 1 | 19 | 4 | 1950 | 2 | 1 | MERCEDES E CLASS | 1 | 1 |
| 430342787 | 2020430342787 | 13 | 9 | 0 | 18 | 7 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 1 | 6 | 1 | -1 | -1 | 1995 | 2 | 2 | BMW 3 SERIES | -1 | -1 |
Again the accident_index will be the main key. But vehicle_reference will be used for ordering casualties.
casualty[casualty['accident_index'].isin(['2020430342787'])]
| accident_index | vehicle_reference | casualty_reference | casualty_class | sex_of_casualty | age_of_casualty | age_band_of_casualty | casualty_severity | pedestrian_location | pedestrian_movement | car_passenger | bus_or_coach_passenger | casualty_type | casualty_home_area_type | casualty_imd_decile | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| accident_reference | |||||||||||||||
| 430342787 | 2020430342787 | 1 | 1 | 2 | 1 | 73 | 10 | 2 | 0 | 0 | 2 | 0 | 9 | -1 | -1 |
| 430342787 | 2020430342787 | 1 | 2 | 1 | 1 | 40 | 7 | 3 | 0 | 0 | 0 | 0 | 9 | 1 | 4 |
Here we see the vehicle_reference that will be used. Both casualties were in the first vehicle. Further, we can see that the age_of_casualty are 73 and 40. Referencing that to the vehicles dataset, we find that age_of_driver was 40.
Now the two dataset will be merged with keys accident_index and vehicle_reference. This way each casualty can be linked to a vehicle.
df_casualty = pd.merge(casualty, vehicles, on=['accident_index', 'vehicle_reference'])
df_casualty.sample(10)
| accident_index | vehicle_reference | casualty_reference | casualty_class | sex_of_casualty | age_of_casualty | age_band_of_casualty | casualty_severity | pedestrian_location | pedestrian_movement | car_passenger | bus_or_coach_passenger | casualty_type | casualty_home_area_type | casualty_imd_decile | vehicle_type | towing_and_articulation | vehicle_manoeuvre | vehicle_direction_from | vehicle_direction_to | vehicle_location_restricted_lane | junction_location | skidding_and_overturning | hit_object_in_carriageway | vehicle_leaving_carriageway | hit_object_off_carriageway | first_point_of_impact | vehicle_left_hand_drive | journey_purpose_of_driver | sex_of_driver | age_of_driver | age_band_of_driver | engine_capacity_cc | propulsion_code | age_of_vehicle | generic_make_model | driver_imd_decile | driver_home_area_type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 52647 | 2020401006730 | 2 | 2 | 1 | 2 | 35 | 6 | 3 | 0 | 0 | 0 | 0 | 9 | 1 | 3 | 9 | 0 | 18 | 2 | 6 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 2 | 2 | 35 | 6 | 1995 | 2 | 0 | BMW 1 SERIES | 3 | 1 |
| 4456 | 2020030989808 | 1 | 2 | 2 | 1 | 66 | 10 | 3 | 0 | 0 | 1 | 0 | 9 | 3 | 6 | 9 | 0 | 18 | 6 | 2 | 0 | 0 | 0 | 0 | 7 | 10 | 1 | 1 | 5 | 1 | 59 | 9 | 2184 | 2 | 17 | NISSAN X-TRAIL | 6 | 3 |
| 45246 | 2020332001108 | 1 | 1 | 1 | 1 | 20 | 4 | 2 | 0 | 0 | 0 | 0 | 9 | -1 | -1 | 9 | 0 | 18 | 5 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 6 | 1 | 20 | 4 | -1 | -1 | -1 | -1 | -1 | -1 |
| 27622 | 2020161003043 | 2 | 1 | 1 | 1 | 45 | 7 | 3 | 0 | 0 | 0 | 0 | 98 | 1 | 1 | 98 | 0 | 13 | 7 | 3 | 0 | 1 | 5 | 0 | 7 | 0 | 4 | 1 | 1 | 1 | 45 | 7 | 2461 | 2 | 18 | -1 | 1 | 1 |
| 70083 | 2020460963475 | 1 | 1 | 1 | 2 | 62 | 9 | 3 | 0 | 0 | 0 | 0 | 9 | 1 | 3 | 9 | 0 | 9 | 5 | 3 | 0 | 2 | 0 | 0 | 0 | 0 | 3 | 1 | 5 | 2 | 62 | 9 | 1968 | 2 | 14 | AUDI A3 | 3 | 1 |
| 74131 | 2020470954512 | 2 | 1 | 1 | 2 | 58 | 9 | 3 | 0 | 0 | 0 | 0 | 9 | 1 | 6 | 9 | 0 | 18 | 2 | 6 | 0 | 8 | 0 | 0 | 0 | 0 | 2 | 1 | 5 | 2 | 58 | 9 | 1318 | 1 | 1 | HONDA JAZZ | 6 | 1 |
| 18833 | 2020122000899 | 1 | 1 | 2 | 1 | 5 | 1 | 3 | 0 | 0 | 2 | 0 | 9 | 1 | 4 | 9 | 0 | 9 | 1 | 6 | 0 | 8 | 0 | 0 | 0 | 0 | 1 | 1 | 6 | 2 | 40 | 7 | 1395 | 1 | 2 | SEAT LEON | 4 | 1 |
| 53282 | 2020410949978 | 1 | 1 | 1 | 1 | 27 | 6 | 3 | 0 | 0 | 0 | 0 | 5 | 1 | 3 | 5 | 0 | 16 | 1 | 5 | 0 | 0 | 1 | 0 | 7 | 0 | 4 | 1 | 6 | 1 | 27 | 6 | 1098 | 1 | 12 | -1 | 3 | 1 |
| 72318 | 2020461007536 | 2 | 2 | 2 | 2 | -1 | -1 | 3 | 0 | 0 | 1 | 0 | 9 | 1 | 8 | 9 | 0 | 4 | 7 | 3 | 0 | 1 | 0 | 0 | 0 | 0 | 2 | 1 | 5 | 2 | 28 | 6 | 1498 | 1 | 1 | FORD GRAND C-MAX | 8 | 1 |
| 45741 | 2020340E04912 | 2 | 1 | 1 | 2 | 53 | 8 | 3 | 0 | 0 | 0 | 0 | 1 | -1 | -1 | 1 | 0 | 18 | 8 | 4 | 0 | 1 | 5 | 0 | 0 | 0 | 1 | 1 | 6 | 2 | 53 | 8 | -1 | -1 | -1 | -1 | 8 | 3 |
df_casualty.shape
(92450, 38)
We observe a decline in datapoints. This is to the fact that not every vehicle involved in an accident had a casualty. If we check back on our previous example we see that data transfered over precisely. Information about the vehicle and passengers is matching.
df_casualty[df_casualty['accident_index'].isin(['2020430342787'])]
| accident_index | vehicle_reference | casualty_reference | casualty_class | sex_of_casualty | age_of_casualty | age_band_of_casualty | casualty_severity | pedestrian_location | pedestrian_movement | car_passenger | bus_or_coach_passenger | casualty_type | casualty_home_area_type | casualty_imd_decile | vehicle_type | towing_and_articulation | vehicle_manoeuvre | vehicle_direction_from | vehicle_direction_to | vehicle_location_restricted_lane | junction_location | skidding_and_overturning | hit_object_in_carriageway | vehicle_leaving_carriageway | hit_object_off_carriageway | first_point_of_impact | vehicle_left_hand_drive | journey_purpose_of_driver | sex_of_driver | age_of_driver | age_band_of_driver | engine_capacity_cc | propulsion_code | age_of_vehicle | generic_make_model | driver_imd_decile | driver_home_area_type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 60809 | 2020430342787 | 1 | 1 | 2 | 1 | 73 | 10 | 2 | 0 | 0 | 2 | 0 | 9 | -1 | -1 | 9 | 0 | 18 | 7 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 1 | 1 | 1 | 40 | 7 | 1797 | 8 | 7 | TOYOTA PRIUS | 4 | 1 |
| 60810 | 2020430342787 | 1 | 2 | 1 | 1 | 40 | 7 | 3 | 0 | 0 | 0 | 0 | 9 | 1 | 4 | 9 | 0 | 18 | 7 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 1 | 1 | 1 | 40 | 7 | 1797 | 8 | 7 | TOYOTA PRIUS | 4 | 1 |
Next we merge the characteristics for each casualty case. This way we obtain an accurate description about the situation from each casualty.
crash = pd.merge(df_casualty, characteristics, on=['accident_index'])
crash.sample(10)
| accident_index | vehicle_reference | casualty_reference | casualty_class | sex_of_casualty | age_of_casualty | age_band_of_casualty | casualty_severity | pedestrian_location | pedestrian_movement | car_passenger | bus_or_coach_passenger | casualty_type | casualty_home_area_type | casualty_imd_decile | vehicle_type | towing_and_articulation | vehicle_manoeuvre | vehicle_direction_from | vehicle_direction_to | vehicle_location_restricted_lane | junction_location | skidding_and_overturning | hit_object_in_carriageway | vehicle_leaving_carriageway | hit_object_off_carriageway | first_point_of_impact | vehicle_left_hand_drive | journey_purpose_of_driver | sex_of_driver | age_of_driver | age_band_of_driver | engine_capacity_cc | propulsion_code | age_of_vehicle | generic_make_model | driver_imd_decile | driver_home_area_type | longitude | latitude | police_force | accident_severity | number_of_vehicles | number_of_casualties | date | day_of_week | local_authority_district | first_road_class | first_road_number | road_type | speed_limit | junction_detail | junction_control | pedestrian_crossing_human_control | pedestrian_crossing_physical_facilities | light_conditions | weather_conditions | special_conditions_at_site | carriageway_hazards | urban_or_rural_area | did_police_officer_attend_scene_of_accident | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 36140 | 2020220977714 | 2 | 1 | 1 | 1 | 77 | 11 | 2 | 0 | 0 | 0 | 0 | 1 | 1 | 10 | 1 | 0 | 18 | 8 | 4 | 0 | 8 | 0 | 0 | 0 | 0 | 3 | 1 | 5 | 1 | 77 | 11 | -1 | -1 | -1 | -1 | 10 | 1 | -1.960806 | 52.352959 | 22 | 2 | 2 | 1 | 30/08/2020 | 1 | 270 | 6 | 0 | 6 | 30 | 3 | 4 | 0 | 0 | 1 | 1 | 0 | 0 | 2 | 1 |
| 16927 | 2020101004144 | 1 | 1 | 3 | 2 | 64 | 9 | 3 | 8 | 3 | 0 | 0 | 0 | -1 | -1 | 9 | 0 | 18 | 3 | 7 | 0 | 8 | 0 | 0 | 0 | 0 | 1 | 1 | 2 | 2 | 33 | 6 | 1956 | 2 | 8 | VAUXHALL INSIGNIA | 4 | 2 | -1.654753 | 54.969741 | 10 | 3 | 1 | 1 | 04/12/2020 | 6 | 147 | 4 | 1311 | 6 | 30 | 3 | 4 | 0 | 0 | 1 | 2 | 0 | 0 | 1 | 2 |
| 54735 | 2020420918164 | 2 | 1 | 2 | 2 | 53 | 8 | 3 | 0 | 0 | 1 | 0 | 9 | -1 | -1 | 9 | 0 | 4 | 5 | 1 | 0 | 2 | 0 | 0 | 0 | 0 | 2 | 1 | 6 | 1 | 58 | 9 | 1796 | 1 | 17 | BMW 3 SERIES | 1 | 1 | 0.486313 | 51.561470 | 42 | 3 | 2 | 3 | 10/01/2020 | 6 | 450 | 4 | 1464 | 6 | 30 | 3 | 4 | 0 | 0 | 7 | 1 | 0 | 0 | 1 | 2 |
| 71011 | 2020460981124 | 2 | 1 | 1 | 1 | 41 | 7 | 3 | 0 | 0 | 0 | 0 | 23 | 1 | 4 | 23 | 0 | 7 | 7 | 3 | 0 | 3 | 0 | 0 | 0 | 0 | 1 | 1 | 2 | 1 | 41 | 7 | -1 | -1 | -1 | -1 | 4 | 1 | 0.583802 | 51.336571 | 46 | 3 | 2 | 1 | 12/09/2020 | 7 | 544 | 3 | 278 | 3 | 50 | 1 | 4 | 0 | 4 | 4 | 1 | 0 | 0 | 2 | 1 |
| 81548 | 2020522002654 | 1 | 1 | 1 | 1 | 58 | 9 | 3 | 0 | 0 | 0 | 0 | 3 | 2 | 7 | 3 | 0 | 18 | 5 | 1 | 0 | 0 | 1 | 11 | 0 | 0 | 1 | 1 | 6 | 1 | 58 | 9 | 998 | 1 | 27 | -1 | 7 | 2 | -2.670526 | 51.470919 | 52 | 3 | 1 | 1 | 06/07/2020 | 2 | 605 | 6 | 0 | 6 | 40 | 0 | -1 | 0 | 0 | 1 | 1 | 5 | 2 | 2 | 2 |
| 41754 | 202031D080120 | 1 | 1 | 3 | 2 | 52 | 8 | 3 | 5 | 3 | 0 | 0 | 0 | 1 | 2 | 1 | 0 | 18 | 2 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 47 | 8 | -1 | -1 | -1 | -1 | -1 | -1 | -1.141808 | 52.987045 | 31 | 3 | 1 | 1 | 17/06/2020 | 4 | 343 | 3 | 60 | 6 | 30 | 0 | -1 | 0 | 4 | 1 | 1 | 0 | 0 | 1 | 1 |
| 9972 | 2020052002619 | 1 | 1 | 2 | 1 | 12 | 3 | 2 | 0 | 0 | 2 | 0 | 9 | 1 | 1 | 9 | 0 | 18 | 1 | 5 | 0 | 1 | 1 | 0 | 1 | 4 | 1 | 1 | 6 | 3 | -1 | -1 | 1968 | 2 | 9 | AUDI A5 | -1 | -1 | -3.129097 | 53.375865 | 5 | 2 | 1 | 1 | 18/12/2020 | 6 | 95 | 4 | 5139 | 6 | 20 | 3 | 4 | 0 | 0 | 4 | 1 | 0 | 0 | 1 | 1 |
| 35791 | 2020220963017 | 2 | 1 | 1 | 1 | 52 | 8 | 2 | 0 | 0 | 0 | 0 | 1 | 3 | 4 | 1 | 0 | 18 | 2 | 7 | 0 | 8 | 0 | 0 | 0 | 0 | 4 | 1 | 6 | 1 | 52 | 8 | -1 | -1 | -1 | -1 | 4 | 3 | -2.694440 | 52.073263 | 22 | 2 | 2 | 1 | 06/07/2020 | 2 | 285 | 3 | 4103 | 1 | 40 | 1 | 4 | 0 | 0 | 1 | 1 | 0 | 0 | 2 | 1 |
| 92328 | 2020990962113 | 1 | 2 | 2 | 1 | 39 | 7 | 2 | 0 | 0 | 1 | 0 | 9 | 3 | 5 | 9 | 0 | 13 | 8 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 6 | 1 | 41 | 7 | 2720 | 2 | 15 | LAND ROVER DISCOVERY | 5 | 3 | -4.445084 | 55.338184 | 99 | 2 | 3 | 6 | 05/07/2020 | 1 | 919 | 3 | 713 | 6 | 60 | 0 | -1 | 0 | 0 | 1 | 2 | 0 | 2 | 2 | 1 |
| 79997 | 2020501015157 | 2 | 1 | 1 | 1 | 17 | 4 | 3 | 0 | 0 | 0 | 0 | 3 | 1 | 7 | 3 | 0 | 18 | 8 | 4 | 0 | 8 | 5 | 0 | 0 | 0 | 1 | 1 | 6 | 1 | 17 | 4 | 125 | 1 | 0 | LEXMOTO LXR | 7 | 1 | -4.816643 | 50.512024 | 50 | 3 | 2 | 1 | 19/12/2020 | 7 | 596 | 3 | 389 | 6 | 60 | 3 | 4 | 0 | 0 | 1 | 1 | 0 | 0 | 2 | 1 |
crash.shape
(92450, 61)
pd.set_option('display.max.columns', None)
crash[crash['accident_index'].isin(['2020430342787'])]
| accident_index | vehicle_reference | casualty_reference | casualty_class | sex_of_casualty | age_of_casualty | age_band_of_casualty | casualty_severity | pedestrian_location | pedestrian_movement | car_passenger | bus_or_coach_passenger | casualty_type | casualty_home_area_type | casualty_imd_decile | vehicle_type | towing_and_articulation | vehicle_manoeuvre | vehicle_direction_from | vehicle_direction_to | vehicle_location_restricted_lane | junction_location | skidding_and_overturning | hit_object_in_carriageway | vehicle_leaving_carriageway | hit_object_off_carriageway | first_point_of_impact | vehicle_left_hand_drive | journey_purpose_of_driver | sex_of_driver | age_of_driver | age_band_of_driver | engine_capacity_cc | propulsion_code | age_of_vehicle | generic_make_model | driver_imd_decile | driver_home_area_type | longitude | latitude | police_force | accident_severity | number_of_vehicles | number_of_casualties | date | day_of_week | local_authority_district | first_road_class | first_road_number | road_type | speed_limit | junction_detail | junction_control | pedestrian_crossing_human_control | pedestrian_crossing_physical_facilities | light_conditions | weather_conditions | special_conditions_at_site | carriageway_hazards | urban_or_rural_area | did_police_officer_attend_scene_of_accident | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 60809 | 2020430342787 | 1 | 1 | 2 | 1 | 73 | 10 | 2 | 0 | 0 | 2 | 0 | 9 | -1 | -1 | 9 | 0 | 18 | 7 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 1 | 1 | 1 | 40 | 7 | 1797 | 8 | 7 | TOYOTA PRIUS | 4 | 1 | -0.698147 | 51.603779 | 43 | 2 | 13 | 2 | 25/10/2020 | 1 | 480 | 1 | 40 | 3 | 70 | 0 | -1 | 0 | 0 | 6 | 1 | 0 | 0 | 1 | 1 |
| 60810 | 2020430342787 | 1 | 2 | 1 | 1 | 40 | 7 | 3 | 0 | 0 | 0 | 0 | 9 | 1 | 4 | 9 | 0 | 18 | 7 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 1 | 1 | 1 | 40 | 7 | 1797 | 8 | 7 | TOYOTA PRIUS | 4 | 1 | -0.698147 | 51.603779 | 43 | 2 | 13 | 2 | 25/10/2020 | 1 | 480 | 1 | 40 | 3 | 70 | 0 | -1 | 0 | 0 | 6 | 1 | 0 | 0 | 1 | 1 |
The dataset remains its shape with added features. We can now furtherly investigate and finally check what final features can be removed.
pd.set_option('display.max.columns', None)
crash = crash.drop([
#'accident_index', 'vehicle_reference', 'casualty_reference',
#'date','day_of_week', 'accident_severity', 'longitude', 'latitude',
#'police_force', 'number_of_casualties', 'local_authority_district', 'first_road_number',
#
], axis=1)
crash.head(10)
| casualty_class | sex_of_casualty | age_of_casualty | age_band_of_casualty | casualty_severity | pedestrian_location | pedestrian_movement | car_passenger | bus_or_coach_passenger | casualty_type | casualty_home_area_type | casualty_imd_decile | vehicle_type | towing_and_articulation | vehicle_manoeuvre | vehicle_direction_from | vehicle_direction_to | vehicle_location_restricted_lane | junction_location | skidding_and_overturning | hit_object_in_carriageway | vehicle_leaving_carriageway | hit_object_off_carriageway | first_point_of_impact | vehicle_left_hand_drive | journey_purpose_of_driver | sex_of_driver | age_of_driver | age_band_of_driver | engine_capacity_cc | propulsion_code | age_of_vehicle | generic_make_model | driver_imd_decile | driver_home_area_type | number_of_vehicles | first_road_class | road_type | speed_limit | junction_detail | junction_control | pedestrian_crossing_human_control | pedestrian_crossing_physical_facilities | light_conditions | weather_conditions | special_conditions_at_site | carriageway_hazards | urban_or_rural_area | did_police_officer_attend_scene_of_accident | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | 23 | 5 | 2 | 0 | 0 | 0 | 0 | 3 | -1 | -1 | 3 | 0 | 6 | 7 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 | 6 | 1 | 23 | 5 | 113 | 1 | 4 | 546 | -1 | -1 | 2 | 5 | 6 | 20 | 0 | -1 | 0 | 0 | 4 | 1 | 0 | 0 | 1 | 1 |
| 1 | 1 | 1 | 17 | 4 | 2 | 0 | 0 | 0 | 0 | 3 | 1 | 1 | 3 | 0 | 99 | 9 | 9 | 99 | 9 | 9 | 99 | 9 | 99 | 9 | 1 | 6 | 1 | 17 | 4 | 125 | 1 | 4 | 170 | 1 | 1 | 2 | 5 | 6 | 30 | 3 | 4 | 9 | 9 | 1 | 1 | 0 | 0 | 1 | 3 |
| 2 | 1 | 1 | 45 | 7 | 3 | 0 | 0 | 0 | 0 | 3 | 1 | 3 | 3 | 0 | 15 | 3 | 7 | 0 | 2 | 0 | 0 | 0 | 0 | 2 | 1 | 1 | 1 | 45 | 7 | 125 | 1 | 0 | 662 | 3 | 1 | 2 | 3 | 2 | 20 | 3 | 2 | 0 | 5 | 4 | 1 | 0 | 0 | 1 | 1 |
| 3 | 1 | 1 | 45 | 7 | 3 | 0 | 0 | 0 | 0 | 3 | 1 | 6 | 3 | 0 | 18 | 4 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 1 | 6 | 1 | 45 | 7 | 124 | 1 | 7 | 0 | 6 | 1 | 2 | 5 | 6 | 30 | 0 | -1 | 0 | 5 | 4 | 1 | 0 | 0 | 1 | 1 |
| 4 | 3 | 1 | 42 | 7 | 2 | 1 | 1 | 0 | 0 | 0 | 1 | 2 | 11 | 0 | 7 | 5 | 7 | 0 | 2 | 0 | 0 | 0 | 0 | 4 | 1 | 1 | 1 | 52 | 8 | 6700 | 2 | 2 | 4 | 3 | 1 | 1 | 6 | 6 | 20 | 3 | 2 | 0 | 5 | 4 | 8 | 0 | 0 | 1 | 1 |
| 5 | 1 | 2 | 21 | 5 | 3 | 0 | 0 | 0 | 0 | 3 | 1 | 7 | 3 | 0 | 99 | 9 | 9 | 99 | 9 | 9 | 99 | 9 | 99 | 1 | 1 | 6 | 2 | 21 | 5 | 125 | 1 | 6 | 210 | 7 | 1 | 2 | 3 | 9 | 20 | 5 | 4 | 0 | 0 | 7 | 2 | 0 | 0 | 1 | 3 |
| 6 | 1 | 1 | 28 | 6 | 3 | 0 | 0 | 0 | 0 | 9 | 1 | 2 | 9 | 0 | 18 | 5 | 1 | 0 | 1 | 0 | 4 | 0 | 0 | 3 | 1 | 2 | 1 | 28 | 6 | 998 | 1 | 3 | 558 | 2 | 1 | 2 | 6 | 6 | 30 | 3 | 4 | 0 | 0 | 1 | 2 | 0 | 0 | 1 | 1 |
| 7 | 1 | 1 | 41 | 7 | 3 | 0 | 0 | 0 | 0 | 3 | 1 | 5 | 3 | 0 | 18 | 1 | 5 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 41 | 7 | 125 | 1 | 0 | 669 | 5 | 1 | 2 | 4 | 6 | 20 | 3 | 2 | 2 | 4 | 4 | 2 | 0 | 0 | 1 | 1 |
| 8 | 1 | 1 | 19 | 4 | 3 | 0 | 0 | 0 | 0 | 9 | 1 | 6 | 9 | 0 | 16 | 5 | 1 | 0 | 3 | 5 | 0 | 0 | 0 | 1 | 1 | 6 | 1 | 19 | 4 | -1 | -1 | -1 | 0 | 6 | 1 | 1 | 3 | 1 | 40 | 1 | 4 | 0 | 0 | 4 | 2 | 0 | 0 | 1 | 1 |
| 9 | 1 | 1 | 36 | 7 | 3 | 0 | 0 | 0 | 0 | 90 | -1 | -1 | 90 | 0 | 18 | 7 | 3 | 7 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 6 | 1 | 36 | 7 | -1 | -1 | -1 | 0 | -1 | -1 | 2 | 5 | 9 | 30 | 0 | -1 | 9 | 9 | 7 | 1 | 9 | 9 | 1 | 3 |
Here we have the final dataset we will be working with. The final step before modelling will be preprocessing. Here are non-numerical values will be categorically classified into a numerical value.
from sklearn.preprocessing import LabelEncoder
def encoding(df):
label = LabelEncoder()
for c in df.select_dtypes("object"):
df[c]=df[c].astype("|S")
df[c]=label.fit_transform(df[c])
return df
def imputation(df):
df = df.fillna(df.median())
df = df.dropna()
return df
def preprocessing(df):
df = encoding(df)
df = imputation(df)
return df
crash = preprocessing(crash)
crash.head()
| accident_index | vehicle_reference | casualty_reference | casualty_class | sex_of_casualty | age_of_casualty | age_band_of_casualty | casualty_severity | pedestrian_location | pedestrian_movement | car_passenger | bus_or_coach_passenger | casualty_type | casualty_home_area_type | casualty_imd_decile | vehicle_type | towing_and_articulation | vehicle_manoeuvre | vehicle_direction_from | vehicle_direction_to | vehicle_location_restricted_lane | junction_location | skidding_and_overturning | hit_object_in_carriageway | vehicle_leaving_carriageway | hit_object_off_carriageway | first_point_of_impact | vehicle_left_hand_drive | journey_purpose_of_driver | sex_of_driver | age_of_driver | age_band_of_driver | engine_capacity_cc | propulsion_code | age_of_vehicle | generic_make_model | driver_imd_decile | driver_home_area_type | longitude | latitude | police_force | accident_severity | number_of_vehicles | number_of_casualties | date | day_of_week | local_authority_district | first_road_class | first_road_number | road_type | speed_limit | junction_detail | junction_control | pedestrian_crossing_human_control | pedestrian_crossing_physical_facilities | light_conditions | weather_conditions | special_conditions_at_site | carriageway_hazards | urban_or_rural_area | did_police_officer_attend_scene_of_accident | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 2 | 1 | 1 | 1 | 23 | 5 | 2 | 0 | 0 | 0 | 0 | 3 | -1 | -1 | 3 | 0 | 6 | 7 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 | 6 | 1 | 23 | 5 | 113 | 1 | 4 | 546 | -1 | -1 | -0.108858 | 51.403761 | 1 | 2 | 2 | 1 | 106 | 2 | 20 | 5 | 0 | 6 | 20 | 0 | -1 | 0 | 0 | 4 | 1 | 0 | 0 | 1 | 1 |
| 1 | 1 | 1 | 1 | 1 | 1 | 17 | 4 | 2 | 0 | 0 | 0 | 0 | 3 | 1 | 1 | 3 | 0 | 99 | 9 | 9 | 99 | 9 | 9 | 99 | 9 | 99 | 9 | 1 | 6 | 1 | 17 | 4 | 125 | 1 | 4 | 170 | 1 | 1 | -0.145519 | 51.546549 | 1 | 2 | 2 | 1 | 46 | 4 | 2 | 5 | 0 | 6 | 30 | 3 | 4 | 9 | 9 | 1 | 1 | 0 | 0 | 1 | 3 |
| 2 | 2 | 1 | 1 | 1 | 1 | 45 | 7 | 3 | 0 | 0 | 0 | 0 | 3 | 1 | 3 | 3 | 0 | 15 | 3 | 7 | 0 | 2 | 0 | 0 | 0 | 0 | 2 | 1 | 1 | 1 | 45 | 7 | 125 | 1 | 0 | 662 | 3 | 1 | -0.066682 | 51.497938 | 1 | 3 | 2 | 1 | 106 | 2 | 8 | 3 | 200 | 2 | 20 | 3 | 2 | 0 | 5 | 4 | 1 | 0 | 0 | 1 | 1 |
| 3 | 3 | 1 | 1 | 1 | 1 | 45 | 7 | 3 | 0 | 0 | 0 | 0 | 3 | 1 | 6 | 3 | 0 | 18 | 4 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 1 | 6 | 1 | 45 | 7 | 124 | 1 | 7 | 0 | 6 | 1 | -0.125965 | 51.437228 | 1 | 3 | 2 | 1 | 106 | 2 | 9 | 5 | 0 | 6 | 30 | 0 | -1 | 0 | 5 | 4 | 1 | 0 | 0 | 1 | 1 |
| 4 | 4 | 1 | 1 | 3 | 1 | 42 | 7 | 2 | 1 | 1 | 0 | 0 | 0 | 1 | 2 | 11 | 0 | 7 | 5 | 7 | 0 | 2 | 0 | 0 | 0 | 0 | 4 | 1 | 1 | 1 | 52 | 8 | 6700 | 2 | 2 | 4 | 3 | 1 | -0.055243 | 51.546633 | 1 | 2 | 1 | 1 | 106 | 2 | 4 | 6 | 0 | 6 | 20 | 3 | 2 | 0 | 5 | 4 | 8 | 0 | 0 | 1 | 1 |
Final usable dataset. This dataset can be exorted as a .csv file and saved in a cloud environment for general purpose use. In this case this will not be done as the modelling and evaluation will be in this document.
Based on the concatenation of the datasets we now have added descriptive features for each case of a casualty within a traffic accident. Now we can start making a plan for the prediction of this variable. First lets visualize the displacement. We previously saw that not many cases involve fatal injuries. Thus the data may be distributed unevenly and hard to use in actual machine learning.
plt.figure(figsize=(15,6))
sns.countplot(crash['casualty_severity'].map({1:'Fatal',
2:'Serious',
3:'Slight'
}))
plt.plot()
[]
crash['casualty_severity'].value_counts()
3 74323 2 16832 1 1295 Name: casualty_severity, dtype: int64
Here we can observe a clear uneven distribution of data. It would be difficult to find instances in which an accurate estimate of the severity can be done because of the lack of cases. In order for the machine to learn the distribution of these has to be even. This way it has equal understanding of situations involving each type of severity.
In order to achieve this down- and oversampling will be applied. With this it takes random cases of each class and removes or keeps it based on the total value. The end goal is to lower or higher the count for each severity while keeping the distribution equal.
# Class count
count_class_3, count_class_2, count_class_1 = crash['casualty_severity'].value_counts()
# Divide by class
df_class_1 = crash[crash['casualty_severity'] == 1]
df_class_2 = crash[crash['casualty_severity'] == 2]
df_class_3 = crash[crash['casualty_severity'] == 3]
df_class_2_under = df_class_2.sample(count_class_1,random_state=42)
df_class_3_under = df_class_3.sample(count_class_1,random_state=42)
df_under = pd.concat([df_class_2_under, df_class_3_under, df_class_1], axis=0)
df_class_1_over = df_class_1.sample(count_class_3, replace=True, random_state=42)
df_class_2_over = df_class_2.sample(count_class_3, replace=True, random_state=42)
df_over = pd.concat([df_class_2_over, df_class_3, df_class_1_over], axis=0)
fig,axes = plt.subplots(1,2,figsize=(20,6),sharey=True)
sns.countplot(ax=axes[0],x=df_under['casualty_severity'].map({1:'Fatal',
2:'Serious',
3:'Slight',
}))
axes[0].set_title('Random Downsampling')
sns.countplot(ax=axes[1],x=df_over['casualty_severity'].map({1:'Fatal',
2:'Serious',
3:'Slight',
}))
axes[1].set_title('Random Oversampling')
plt.plot()
[]
By downsampling the data, the model will have lesser instances for each case to learn from. Thus the possibility of a wrong prediction will be greater, as a chance to mix two close examples might favor one side.
On the side of oversampling data there might be too much information. As such the model might always select the outcome based on a previously existing combination.
In order to check for irregularities, from this point on all 3 dataset will be evaluated. This way we can check what impact down or oversampling has on the accuracy.
One of the most used algorithm types for classification are Decision Trees. In order to oversee if this data can be of any benefit, a test will be made using the Random Forest Classifier.
The data first has to be split into the target variable and usable features. A comparison will be made between the oversampled and normal dataset
First the target variable has to be removed from the datasets.
X = crash.drop('casualty_severity', axis=1)
y = crash['casualty_severity']
X_over = df_over.drop('casualty_severity', axis=1)
y_over = df_over['casualty_severity']
X_under = df_under.drop('casualty_severity', axis=1)
y_under = df_under['casualty_severity']
scaler = StandardScaler()
x_scaled = scaler.fit_transform(X)
x_scaled_over = scaler.fit_transform(X_over)
x_scaled_under = scaler.fit_transform(X_under)
Splitting the data into training and tests sets.
X_train, X_test, y_train, y_test = train_test_split(
x_scaled, y, stratify=y, test_size=0.10, random_state=42
)
X_train_over, X_test_over, y_train_over, y_test_over = train_test_split(
x_scaled_over, y_over, stratify=y_over, test_size=0.10, random_state=42
)
X_train_under, X_test_under, y_train_under, y_test_under = train_test_split(
x_scaled_under, y_under, stratify=y_under, test_size=0.10, random_state=42
)
Training models based on the 3 datasets.
classifier = RandomForestClassifier(n_estimators=100)
classifier_over = RandomForestClassifier(n_estimators=100)
classifier_under = RandomForestClassifier(n_estimators=100)
print(f'Started at: {datetime.now().strftime("%H:%M:%S")}')
classifier.fit(X_train, y_train)
classifier_over.fit(X_train_over, y_train_over)
classifier_under.fit(X_train_under, y_train_under)
print(f'Finished at: {datetime.now().strftime("%H:%M:%S")}')
Started at: 18:30:19 Finished at: 18:31:17
Now we can do our first predictions.
print(f'Started at: {datetime.now().strftime("%H:%M:%S")}')
y_pred = classifier.predict(X_test)
y_pred_over = classifier_over.predict(X_test_over)
y_pred_under = classifier_under.predict(X_test_under)
print(f'Finished at: {datetime.now().strftime("%H:%M:%S")}')
print("Accuracy normal:", accuracy_score(y_test, y_pred))
print("Accuracy oversampling:", accuracy_score(y_test_over, y_pred_over))
print("Accuracy undersampling:", accuracy_score(y_test_under, y_pred_under))
Started at: 19:48:09 Finished at: 19:48:11 Accuracy normal: 0.8097349918875067 Accuracy oversampling: 0.9730905502982464 Accuracy undersampling: 0.6066838046272494
Here we observe what we expected. The accuracy increases when oversampling. and decreases with downsampling. Even though an accuracy of 97% seems very high. So there might be some kind of bias or instability involved.
First lets check some more statistics from the model.
confusion_matrix(y_test, y_pred)
array([[ 3, 33, 94],
[ 3, 259, 1421],
[ 2, 206, 7224]], dtype=int64)
# Get and reshape confusion matrix data
matrix = confusion_matrix(y_test, y_pred)
#matrix = confusion_matrix(y_test, y_pred)
matrix = matrix.astype('float') / matrix.sum(axis=1)[:, np.newaxis]
# Build the plot
plt.figure(figsize=(16,7))
sns.set(font_scale=1.4)
sns.heatmap(matrix, annot=True, annot_kws={'size':10},
cmap=plt.cm.Greens, linewidths=0.2)
# Add labels to the plot
class_names = ['Fatal', 'Serious', 'Slight']
tick_marks = np.arange(len(class_names))
tick_marks2 = tick_marks + 0.5
plt.xticks(tick_marks, class_names, rotation=25)
plt.yticks(tick_marks2, class_names, rotation=0)
plt.xlabel('Predicted label')
plt.ylabel('True label')
plt.title('Confusion Matrix for Random Forest Model')
plt.show()
We can observe that with the low amount of cases, there is not a lot of sureness about the severity. Especially in the case of predicting fatality, it is only accurate 0.023% of the time. When comparing this to the oversampled results, we observe a big difference.
matrix = confusion_matrix(y_test_over, y_pred_over)
matrix = matrix.astype('float') / matrix.sum(axis=1)[:, np.newaxis]
# Build the plot
plt.figure(figsize=(16,7))
sns.set(font_scale=1.4)
sns.heatmap(matrix, annot=True, annot_kws={'size':10},
cmap=plt.cm.Greens, linewidths=0.2)
# Add labels to the plot
class_names = ['Fatal', 'Serious', 'Slight']
tick_marks = np.arange(len(class_names))
tick_marks2 = tick_marks + 0.5
plt.xticks(tick_marks, class_names, rotation=25)
plt.yticks(tick_marks2, class_names, rotation=0)
plt.xlabel('Predicted label')
plt.ylabel('True label')
plt.title('Confusion Matrix for Random Forest Model')
plt.show()
With a much more satisfied dataset. Now there is barely any doubt. We can see that almost in all cases the results are indeed predicted accurately. In order to make sure there is no bias, we can take a look at the features of most importance.
# check Important features
feature_importances = pd.DataFrame(
{"feature": list(X.columns), "importance": classifier.feature_importances_}
).sort_values("importance", ascending=False)
feature_importances_over = pd.DataFrame(
{"feature": list(X_over.columns), "importance": classifier_over.feature_importances_}
).sort_values("importance", ascending=False)
feature_importances_under = pd.DataFrame(
{"feature": list(X_over.columns), "importance": classifier_under.feature_importances_}
).sort_values("importance", ascending=False)
feature_importances_over
| feature | importance | |
|---|---|---|
| 2 | age_of_casualty | 0.068232 |
| 26 | age_of_driver | 0.054631 |
| 28 | engine_capacity_cc | 0.047861 |
| 31 | generic_make_model | 0.043555 |
| 30 | age_of_vehicle | 0.041958 |
| 10 | casualty_imd_decile | 0.039612 |
| 15 | vehicle_direction_to | 0.038464 |
| 32 | driver_imd_decile | 0.037849 |
| 14 | vehicle_direction_from | 0.037550 |
| 37 | speed_limit | 0.034799 |
| 13 | vehicle_manoeuvre | 0.032502 |
| 3 | age_band_of_casualty | 0.031964 |
| 8 | casualty_type | 0.028861 |
| 22 | first_point_of_impact | 0.027913 |
| 24 | journey_purpose_of_driver | 0.024835 |
| 27 | age_band_of_driver | 0.024353 |
| 34 | number_of_vehicles | 0.024342 |
| 35 | first_road_class | 0.024224 |
| 20 | vehicle_leaving_carriageway | 0.023227 |
| 11 | vehicle_type | 0.022254 |
| 38 | junction_detail | 0.020008 |
| 42 | light_conditions | 0.019428 |
| 17 | junction_location | 0.018434 |
| 18 | skidding_and_overturning | 0.016659 |
| 21 | hit_object_off_carriageway | 0.016467 |
| 9 | casualty_home_area_type | 0.016131 |
| 43 | weather_conditions | 0.015603 |
| 33 | driver_home_area_type | 0.014934 |
| 47 | did_police_officer_attend_scene_of_accident | 0.014864 |
| 36 | road_type | 0.012417 |
| 29 | propulsion_code | 0.012364 |
| 46 | urban_or_rural_area | 0.012326 |
| 1 | sex_of_casualty | 0.011348 |
| 39 | junction_control | 0.010564 |
| 25 | sex_of_driver | 0.010465 |
| 41 | pedestrian_crossing_physical_facilities | 0.010187 |
| 0 | casualty_class | 0.009765 |
| 4 | pedestrian_location | 0.009105 |
| 19 | hit_object_in_carriageway | 0.008623 |
| 5 | pedestrian_movement | 0.008537 |
| 6 | car_passenger | 0.007916 |
| 16 | vehicle_location_restricted_lane | 0.004900 |
| 45 | carriageway_hazards | 0.003007 |
| 44 | special_conditions_at_site | 0.002916 |
| 12 | towing_and_articulation | 0.001760 |
| 40 | pedestrian_crossing_human_control | 0.001345 |
| 23 | vehicle_left_hand_drive | 0.000602 |
| 7 | bus_or_coach_passenger | 0.000339 |
Here we see that the age_of_casualty has a high importance. This could be linked to the fact that younger drivers might be more dangerous, or older drivers are more vulnerable. After that we mostly observe features from the vehicles dataset. Proning to the fact that certain types of vehicles have higher rates of casualties.
Finally there are no features which could have an impact on the prediction of the model. Say if a car has 5 passengers that there is always a casualty.
print(classification_report(y_test_over, y_pred_over))
precision recall f1-score support
1 1.00 1.00 1.00 7432
2 0.94 0.98 0.96 7432
3 0.98 0.93 0.96 7433
accuracy 0.97 22297
macro avg 0.97 0.97 0.97 22297
weighted avg 0.97 0.97 0.97 22297
We can observe that the recall of slightly injured is low. meaning that this is where the model is still unsure. At this stage it is hard to estimate what other features could improve the accuracy.
As we have previously seen it is possible to quite accurately predict the casualty severity in a traffic accident using the Random Forest Classifier. But what about other algorithms? Is there a chance to predict it more accurately. We will start again by splitting the dataset into a training and testing set.
trainset, testset = train_test_split(df_over, test_size=0.15, random_state=42)
X_train = trainset.drop('casualty_severity',axis=1)
y_train = trainset['casualty_severity']
X_test = testset.drop('casualty_severity',axis=1)
y_test = testset['casualty_severity']
preprocessor = make_pipeline(StandardScaler())
PCAPipeline = make_pipeline(preprocessor, PCA(n_components=3,random_state=0))
RandomPipeline = make_pipeline(preprocessor,RandomForestClassifier(random_state=0))
AdaPipeline = make_pipeline(preprocessor,AdaBoostClassifier(random_state=0))
#SVMPipeline = make_pipeline(preprocessor,SVC(random_state=0,probability=True))
KNNPipeline = make_pipeline(preprocessor,KNeighborsClassifier())
LRPipeline = make_pipeline(preprocessor,LogisticRegression(solver='sag'))
dict_of_models = {'KNN': KNNPipeline,
'RandomForest': RandomPipeline,
'AdaBoost': AdaPipeline,
#'SVM': SVMPipeline,
'LR': LRPipeline}
def evaluation(model):
model.fit(X_train, y_train)
# calculating the predictions
y_pred = model.predict(X_test)
print('Accuracy = ', accuracy_score(y_test, y_pred))
print('-')
print(confusion_matrix(y_test,y_pred))
print('-')
print(classification_report(y_test,y_pred))
print('-')
Display Random forest logic
Grid search -> SVC or RandomForest -> Crossvalidation
for name, model in dict_of_models.items():
print('---------------------------------')
print(name)
evaluation(model)
---------------------------------
KNN
Accuracy = 0.8352867308497279
-
[[11268 0 0]
[ 159 9755 1161]
[ 308 3881 6914]]
-
precision recall f1-score support
1 0.96 1.00 0.98 11268
2 0.72 0.88 0.79 11075
3 0.86 0.62 0.72 11103
accuracy 0.84 33446
macro avg 0.84 0.83 0.83 33446
weighted avg 0.84 0.84 0.83 33446
-
---------------------------------
RandomForest
Accuracy = 0.9705794414877713
-
[[11268 0 0]
[ 3 10883 189]
[ 9 783 10311]]
-
precision recall f1-score support
1 1.00 1.00 1.00 11268
2 0.93 0.98 0.96 11075
3 0.98 0.93 0.95 11103
accuracy 0.97 33446
macro avg 0.97 0.97 0.97 33446
weighted avg 0.97 0.97 0.97 33446
-
---------------------------------
AdaBoost
Accuracy = 0.5748669497099803
-
[[7641 2042 1585]
[3209 4608 3258]
[1591 2534 6978]]
-
precision recall f1-score support
1 0.61 0.68 0.64 11268
2 0.50 0.42 0.45 11075
3 0.59 0.63 0.61 11103
accuracy 0.57 33446
macro avg 0.57 0.57 0.57 33446
weighted avg 0.57 0.57 0.57 33446
-
---------------------------------
LR
Accuracy = 0.5514261795132452
-
[[7603 2211 1454]
[3387 4381 3307]
[1958 2686 6459]]
-
precision recall f1-score support
1 0.59 0.67 0.63 11268
2 0.47 0.40 0.43 11075
3 0.58 0.58 0.58 11103
accuracy 0.55 33446
macro avg 0.55 0.55 0.55 33446
weighted avg 0.55 0.55 0.55 33446
-
From the results we observe we can assume that the Random Forest Classifier scores the best and thus we can continue with further optimizing it. To see if it is possible to get even better results.
from sklearn.model_selection import RandomizedSearchCV
RandomPipeline.get_params().keys()
dict_keys(['memory', 'steps', 'verbose', 'pipeline', 'randomforestclassifier', 'pipeline__memory', 'pipeline__steps', 'pipeline__verbose', 'pipeline__standardscaler', 'pipeline__standardscaler__copy', 'pipeline__standardscaler__with_mean', 'pipeline__standardscaler__with_std', 'randomforestclassifier__bootstrap', 'randomforestclassifier__ccp_alpha', 'randomforestclassifier__class_weight', 'randomforestclassifier__criterion', 'randomforestclassifier__max_depth', 'randomforestclassifier__max_features', 'randomforestclassifier__max_leaf_nodes', 'randomforestclassifier__max_samples', 'randomforestclassifier__min_impurity_decrease', 'randomforestclassifier__min_impurity_split', 'randomforestclassifier__min_samples_leaf', 'randomforestclassifier__min_samples_split', 'randomforestclassifier__min_weight_fraction_leaf', 'randomforestclassifier__n_estimators', 'randomforestclassifier__n_jobs', 'randomforestclassifier__oob_score', 'randomforestclassifier__random_state', 'randomforestclassifier__verbose', 'randomforestclassifier__warm_start'])
hyper_params = {
'randomforestclassifier__n_estimators':[10,100,150,250,400,600],
'randomforestclassifier__criterion':['gini','entropy'],
'randomforestclassifier__min_samples_split':[2,6,12],
'randomforestclassifier__min_samples_leaf':[1,4,6,10],
'randomforestclassifier__max_features':['auto','srqt','log2',int,float],
'randomforestclassifier__verbose':[0,1,2],
'randomforestclassifier__class_weight':['balanced','balanced_subsample'],
'randomforestclassifier__n_jobs':[-1],
}
RF_grid = RandomizedSearchCV(RandomPipeline,hyper_params,scoring='accuracy',n_iter=5)
RF_grid.fit(X_train,y_train)
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 5.5s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 14.1s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.2s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 2.8s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 11.5s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.2s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 2.8s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 11.2s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.2s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 2.9s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 11.7s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.2s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 3.0s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 12.0s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.2s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 2.2s [Parallel(n_jobs=-1)]: Done 146 tasks | elapsed: 11.5s [Parallel(n_jobs=-1)]: Done 349 tasks | elapsed: 29.3s [Parallel(n_jobs=-1)]: Done 400 out of 400 | elapsed: 33.2s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 25 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 146 tasks | elapsed: 0.2s [Parallel(n_jobs=8)]: Done 349 tasks | elapsed: 0.6s [Parallel(n_jobs=8)]: Done 400 out of 400 | elapsed: 0.7s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 2.3s [Parallel(n_jobs=-1)]: Done 146 tasks | elapsed: 12.3s [Parallel(n_jobs=-1)]: Done 349 tasks | elapsed: 28.3s [Parallel(n_jobs=-1)]: Done 400 out of 400 | elapsed: 31.9s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 25 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 146 tasks | elapsed: 0.2s [Parallel(n_jobs=8)]: Done 349 tasks | elapsed: 0.6s [Parallel(n_jobs=8)]: Done 400 out of 400 | elapsed: 0.7s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 2.1s [Parallel(n_jobs=-1)]: Done 146 tasks | elapsed: 11.3s [Parallel(n_jobs=-1)]: Done 349 tasks | elapsed: 27.8s [Parallel(n_jobs=-1)]: Done 400 out of 400 | elapsed: 32.2s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 25 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 146 tasks | elapsed: 0.2s [Parallel(n_jobs=8)]: Done 349 tasks | elapsed: 0.6s [Parallel(n_jobs=8)]: Done 400 out of 400 | elapsed: 0.7s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 2.4s [Parallel(n_jobs=-1)]: Done 146 tasks | elapsed: 12.0s [Parallel(n_jobs=-1)]: Done 349 tasks | elapsed: 30.4s [Parallel(n_jobs=-1)]: Done 400 out of 400 | elapsed: 34.6s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 25 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 146 tasks | elapsed: 0.2s [Parallel(n_jobs=8)]: Done 349 tasks | elapsed: 0.6s [Parallel(n_jobs=8)]: Done 400 out of 400 | elapsed: 0.7s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 2.3s [Parallel(n_jobs=-1)]: Done 146 tasks | elapsed: 12.0s [Parallel(n_jobs=-1)]: Done 349 tasks | elapsed: 28.3s [Parallel(n_jobs=-1)]: Done 400 out of 400 | elapsed: 32.5s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 25 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 146 tasks | elapsed: 0.2s [Parallel(n_jobs=8)]: Done 349 tasks | elapsed: 0.5s [Parallel(n_jobs=8)]: Done 400 out of 400 | elapsed: 0.6s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 3.6s [Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed: 9.1s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 100 out of 100 | elapsed: 0.1s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 3.3s [Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed: 8.7s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 100 out of 100 | elapsed: 0.1s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 3.4s [Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed: 8.8s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 100 out of 100 | elapsed: 0.1s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 3.2s [Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed: 8.7s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 100 out of 100 | elapsed: 0.1s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 3.5s [Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed: 8.7s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 100 out of 100 | elapsed: 0.1s finished [Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 3.9s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 15.5s finished
RandomizedSearchCV(estimator=Pipeline(steps=[('pipeline',
Pipeline(steps=[('standardscaler',
StandardScaler())])),
('randomforestclassifier',
RandomForestClassifier(random_state=0))]),
n_iter=5,
param_distributions={'randomforestclassifier__class_weight': ['balanced',
'balanced_subsample'],
'randomforestclassifier__criterion': ['gini',
'entropy'],
'randomforestclassifier__max_features': ['auto',
'srqt',
'log2',
<class 'int'>,
<class 'float'>],
'randomforestclassifier__min_samples_leaf': [1,
4,
6,
10],
'randomforestclassifier__min_samples_split': [2,
6,
12],
'randomforestclassifier__n_estimators': [10,
100,
150,
250,
400,
600],
'randomforestclassifier__n_jobs': [-1],
'randomforestclassifier__verbose': [0,
1,
2]},
scoring='accuracy')
print(RF_grid.best_params_)
{'randomforestclassifier__verbose': 1, 'randomforestclassifier__n_jobs': -1, 'randomforestclassifier__n_estimators': 150, 'randomforestclassifier__min_samples_split': 6, 'randomforestclassifier__min_samples_leaf': 6, 'randomforestclassifier__max_features': 'log2', 'randomforestclassifier__criterion': 'entropy', 'randomforestclassifier__class_weight': 'balanced'}
best_forest = (RF_grid.best_estimator_)
best_forest.fit(X_train,y_train)
# calculating the predictions
y_pred = best_forest.predict(X_test)
N, train_score, test_score = learning_curve(best_forest, X_train, y_train,
cv=4, scoring='accuracy',
train_sizes=np.linspace(0.1,1,10))
[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 3.7s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 14.8s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.2s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 56 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 0.6s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.2s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.0s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 52 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 1.4s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.1s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 52 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 2.6s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 52 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 3.4s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 52 tasks | elapsed: 1.6s [Parallel(n_jobs=-1)]: Done 135 out of 150 | elapsed: 4.0s remaining: 0.4s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 4.2s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.2s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.4s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 6.0s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.1s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.7s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 2.0s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 8.1s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.1s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.8s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 2.2s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 9.5s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.4s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.2s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.9s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 2.4s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 9.9s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.1s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.9s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 2.7s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 11.2s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.2s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 1.0s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 56 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 0.6s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.2s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.0s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 52 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 1.5s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.2s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.1s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 52 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 135 out of 150 | elapsed: 2.1s remaining: 0.1s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 2.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.2s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 52 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 3.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.4s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 4.7s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.1s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.5s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 5.9s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.1s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.6s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.6s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 7.1s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.1s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.7s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 2.0s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 8.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.1s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.9s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 2.5s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 9.9s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.2s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 1.0s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 2.6s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 11.0s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.2s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 1.1s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 56 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 0.6s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.2s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.0s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 52 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 1.6s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.2s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.1s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 52 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 135 out of 150 | elapsed: 2.2s remaining: 0.2s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 2.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.2s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 52 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 3.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.2s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 4.5s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.2s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.4s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 5.9s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.1s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.6s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.7s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 6.7s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.2s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.8s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 2.3s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 10.1s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.1s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.8s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 2.4s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 10.8s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.2s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 1.0s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 2.7s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 10.9s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.2s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 1.2s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 56 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 0.7s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.2s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.0s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 52 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 1.5s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.2s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.1s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 52 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 135 out of 150 | elapsed: 2.1s remaining: 0.1s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 2.2s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.2s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.2s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 52 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 3.7s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.2s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.4s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 4.7s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.2s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.4s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 5.8s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.1s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.5s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.7s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 7.7s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.1s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.7s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 2.2s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 9.1s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.1s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.9s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 2.2s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 9.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.1s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.9s finished [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 2.8s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 10.7s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.0s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 0.3s finished [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.2s [Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 1.1s finished
print('Accuracy = ', accuracy_score(y_test, y_pred))
print('-')
print(confusion_matrix(y_test,y_pred))
print('-')
print(classification_report(y_test,y_pred))
print('-')
plt.figure(figsize=(5,5))
plt.plot(N, train_score.mean(axis=1), label='train score')
plt.plot(N, test_score.mean(axis=1), label='validation score')
plt.legend()
plt.title('Accuracy')
plt.show()
Accuracy = 0.979369730311547
-
[[11268 0 0]
[ 98 10962 15]
[ 23 554 10526]]
-
precision recall f1-score support
1 0.99 1.00 0.99 11268
2 0.95 0.99 0.97 11075
3 1.00 0.95 0.97 11103
accuracy 0.98 33446
macro avg 0.98 0.98 0.98 33446
weighted avg 0.98 0.98 0.98 33446
-
Based on the research above we can conclude that for this case the Random Forest Classifier is the top pick. Generating great results with the over sampled data.
Some more extensive data cleaning can be done to remove the remaining features that have little to no impact on the prediction to further improve the score.
Another improvement which could be made is combining data from multiple years. This would have been difficult to compute so choice has been made to evaluate data from one year. With more computing power it would be possible to get a better result combining all the years.
Also some more extensive hyper paramater tuning can be done, running it for more iterations could improve the accuracy. This is not done in this notebook because it is very time consuming. Choice has been made to go for 5 iterations which take around 10 minutes.
[1] UK Gov. Road Safety Data. (n.d.). Retrieved November 27, 2021, from https://data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-safety-data.
[2] What are the different types or road in the UK? Bituchem. (2021, January 27). Retrieved November 28, 2021, from https://www.bituchem.com/knowledge-hub/what-are-the-different-types-of-road-in-the-uk/.
[3] Service, G. D. (2015, April 5). Speed limits in the UK. GOV.UK. Retrieved November 28, 2021, from https://www.gov.uk/speed-limits#:~:text=National%20speed%20limits,there%20are%20signs%20showing%20otherwise.
[4] Highways England. GOV.UK. (n.d.). Retrieved November 28, 2021, from https://www.gov.uk/government/organisations/highways-england.
[5] The English index of multiple deprivation (IMD) 2015 guidance. (n.d.). Retrieved November 28, 2021, from https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/464430/English_Index_of_Multiple_Deprivation_2015_-_Guidance.pdf.